Skip to main content

Documentation Index

Fetch the complete documentation index at: https://kernel.sh/docs/llms.txt

Use this file to discover all available pages before exploring further.

Kernel browsers expose four ways to drive a session. For agents, we recommend computer use or playwright execution — both run co-located with the browser and avoid the bot-detection surface a direct CDP connection introduces.
Kernel’s Computer Controls API exposes OS-level mouse, keyboard, and screen primitives — the surface a computer-use model already knows how to drive (screenshot, click, type, key, scroll, drag). No CDP or WebDriver connection required, so there’s no protocol fingerprint to leak. Ideal for Claude, OpenAI, or Gemini computer-use loops.
import Kernel from '@onkernel/sdk';

const kernel = new Kernel();
const kernelBrowser = await kernel.browsers.create();

const screenshot = await kernel.browsers.computer.captureScreenshot(kernelBrowser.session_id);

await kernel.browsers.computer.clickMouse(kernelBrowser.session_id, {
  x: 420,
  y: 280,
});

await kernel.browsers.computer.typeText(kernelBrowser.session_id, {
  text: 'kernel cloud browsers',
});

Why computer use for agents

Kernel’s computer controls are built to match how computer-use models were trained — the same primitives the model emits (screenshot, click at coords, type, key, scroll, drag) map 1:1 onto the API. There’s no harness translating model output into framework calls.
  • Native fit. Screenshot, click, type, key, scroll, drag — the primitives the model already speaks.
  • Faster screenshots. Captures bypass CDP, which removes the largest source of latency in a vision loop.
  • Better against bot detection. No CDP connection means no CDP fingerprint to leak. Pairs naturally with stealth mode and residential proxies.
  • Human-like input. OS-level events with Bézier-curve mouse paths, variable typing speed, and configurable mistype rate.
  • Not DOM-limited. Screenshots capture the full VM, so the agent can see and interact with native dialogs, canvas elements, iframes, and PDFs — not just things you can address with a selector.

Why playwright execution over a direct CDP connection

If you’re reaching for Playwright, prefer the execution API over connectOverCDP. Same Playwright API you already know, none of the setup.
  • Run from anywhere. No playwright package to version-pin, no Chromium download, no CDP connection to manage. Send the code, get the result.
  • Co-located with the browser. Code runs in the same VM as the browser — no network hop between your script and the page, fewer flakes.
  • Patchright by default. Hardened against bot detection out of the box.
  • Full Playwright API. page, context, and browser are all in scope. Anything Playwright can do — DOM queries, file uploads, full-page screenshots — works here.
  • Returns values. return from your code and the result comes back in the response. Easy to use as an agent tool.

Computer use + playwright execution

Computer controls drive the browser the way a person would — they don’t speak the programmatic API surface. Anything you’d reach for the DOM or Playwright client for (reading text and attributes, page.goto, file uploads, cookie or storage access, switching tabs) belongs on the playwright execution side. The recommended pattern for agents is computer controls for interaction, playwright execution as a tool the agent can call when it needs structured data or a programmatic action.
const response = await kernel.browsers.playwright.execute(
  kernelBrowser.session_id,
  {
    code: `
      const rows = await page.$$eval('table tr', (trs) =>
        trs.map((tr) => Array.from(tr.querySelectorAll('td')).map((td) => td.textContent))
      );
      return rows;
    `,
  },
);

console.log(response.result);

Going deeper