文档

aipeek

Gives AI a peek into — and a hand on — your running browser app. Reads the UI tree (React fiber), semantic DOM, console, network, errors, and store state; drives the page (click/fill/press/wait/screenshot). All over plain-text HTTP on your Vite dev server — zero resident context cost, unlike a browser MCP whose tool schemas sit in the model's context whether used or not.

10× faster end-to-end. What you feel is wall-clock from prompt to done — model thinking, round-trips, all of it. Screenshot agents (Playwright + vision) pay 2–5s of pixel-parsing every step; aipeek reads semantic text (instant) and batches a whole interaction into one round-trip with /chain — the model thinks once, not N times.

It lives inside the open page (injected client + HMR channel), so it reads React/store internals a DOM-only driver can't, and acts on the current tab with no separate browser process. It does not open browsers, navigate, run headless, or fire real pointer events — it's the dev inner loop, not E2E. For that, use Playwright.

#Install

npm i aipeek
# or
pnpm add aipeek
# or
bun add aipeek

#Setup

// vite.config.ts
import { aipeekPlugin } from 'aipeek'

export default defineConfig({
  plugins: [aipeekPlugin()],
})

#API Endpoints

All endpoints are available on your Vite dev server:

EndpointDescription
GET /__aipeek/screenState-machine projection{view, modal, focus, knobs}. Start here.
GET /__aipeekSummary of all sections (UI, console, network, errors, state)
GET /__aipeek/{section}Detail for a section (ui, console, network, errors, state)
GET /__aipeek/{section}/{index}Detail for a specific item in a section
GET /__aipeek/{section}?fullFull detail (no truncation)
GET /__aipeek/dom[?scope=Name|?sel=css]Semantic DOM — UI as text (see below)
GET /__aipeek/query?sel=cssRead-side twin of sel=: a selector's live count + each match's text/visible/attrs (role, data-state, aria-*/data-*, value, disabled). Per-element assertions without /eval.
GET /__aipeek/{action}?...Drive the page (see Actions)
GET /__aipeek/tabsList live tabs (id, visible/background, title) for ?tab= addressing
POST /__aipeek/chainRun a JSON array of actions in one round-trip (see Actions)
GET|POST /__aipeek/evalRun arbitrary JS in the page (?code= or POST body); returns the result. Escape hatch for what typed endpoints can't do — for count/text/state/attr checks reach for /query first.

#Perception layers — UI as text, not pixels

For a model, the UI's optimal representation is its semantics, not rendered pixels. A screenshot forces pixel→meaning re-derivation and costs hundreds of tokens; the same information is already textual in the DOM. aipeek exposes four layers, cheapest first:

  • /screen — state-machine projection. The whole UI collapsed to what a human reads off a washing-machine panel: view (which area), modal (is something covering it), focus, and knobs (the few reachable controls now — repeated rows fold to source ×N, and when a modal is open only its subtree counts). A handful of lines. Start here.
  • /ui — React component tree. Full structure. Deep-dive when /screen isn't enough.
  • /dom — semantic DOM: tag·role·semantic-class·data-*·state per element, with Tailwind/atomic noise stripped and each line tagged with its source location (@File.tsx:line, via code-inspector if present). This is what tells the model what an element is, its live state, and where to edit it.
  • /screenshot — pixels. Lossy DOM→PNG (html-to-image). Only for visual checks a human looks at; not the model's primary channel.

Scoped DOM — work top-down. The full DOM is huge; a scoped view is accurate. Read /ui to find a component, then scope the DOM to it:

curl localhost:5195/__aipeek/ui                  # find <ChatInput>
curl 'localhost:5195/__aipeek/dom?scope=ChatInput' # just that subtree (matches source path)
curl 'localhost:5195/__aipeek/dom?sel=.chat-list'  # or any CSS subtree

scope= matches against the data-insp-path source path, so directory structure acts as the component boundary. Each line's @File.tsx:line then tells you exactly where to edit.

#Actions — drive the current tab

EndpointParamsEffect
/clicksel= (CSS) or text= (visible text)dispatch a real click
/fillsel=/text= + value=set value on input/textarea/select via React's native value setter (fires onChange on controlled inputs); contenteditable via execCommand
/presskey= (e.g. Enter, Control+a)keydown/keyup on the focused element
/waittext=/sel=, timeout= (ms, default 5000)poll until it appears; 504 on timeout
/screenshotsel=, out=DOM→PNG into .aipeek/; skips cross-origin/broken images
POST /chainJSON array of {type, sel?, text?, value?, key?, timeout?}run in sequence, settle between steps, stop on first failure

click/fill/press settle the DOM and append --- changed --- — only the state-machine transition this action caused (view: a → b, modal: opened X, focus: …) plus any new errors/failed requests, not a fresh snapshot. (no state change) means nothing moved. You read the delta and drill into /ui or /dom for detail only if you need it. On a target miss, /click and /fill return the reachable clickable elements (clipped to the open modal's subtree) so you can re-target.

They also append a --- recent actions --- timeline — the semantic page actions in order (T=trusted human / S=synthetic aipeek), each with its resulting UI change, your own action bracketed by 你当前的行为 dividers. If the user manipulates the page concurrently (closes a dialog you opened), their action shows up in your next response — conflict surfaces automatically.

A CSS sel= with non-ASCII or quotes/brackets must be URL-encoded, or the query parser mangles it: curl -G .../click --data-urlencode 'sel=button[title="知识库"]'.

Multiple tabs. Every read/drive command takes ?tab=<id> to address one tab — including a background one (drive the Chat tab while the user reads a different tab; synthetic events and Electron sendInputEvent don't need foreground). GET /tabs lists the live ids. One tab open → omit ?tab=, it just works. Several open + no ?tab= → the command returns 409 + the tab list instead of randomly hitting one; pick an id and retry with ?tab=.

Chain packs a whole interaction into one round-trip:

curl -X POST localhost:5195/__aipeek/chain -d '[
  {"type":"click","sel":"button[title=\"知识库\"]"},
  {"type":"wait","text":"Done"},
  {"type":"fill","sel":"textarea","value":"hi"},
  {"type":"press","key":"Enter"}
]'

#CLI

npx aipeek                  # fetch from localhost:5195
npx aipeek --port=3000      # custom port

#Store Registration (optional)

Register MobX/other stores for state inspection:

window.__AIPEEK_STORES__ = { myStore, anotherStore }

State is snapshotted on demand (depth-limited, bounded) and included in the <state> section.