aipeek
Gives AI a peek into — and a hand on — your running browser app. Reads the UI tree (React fiber), semantic DOM, console, network, errors, and store state; drives the page (click/fill/press/wait/screenshot). All over plain-text HTTP on your Vite dev server — zero resident context cost, unlike a browser MCP whose tool schemas sit in the model's context whether used or not.
10× faster end-to-end. What you feel is wall-clock from prompt to done — model thinking, round-trips, all of it. Screenshot agents (Playwright + vision) pay 2–5s of pixel-parsing every step; aipeek reads semantic text (instant) and batches a whole interaction into one round-trip with /chain — the model thinks once, not N times.
It lives inside the open page (injected client + HMR channel), so it reads React/store internals a DOM-only driver can't, and acts on the current tab with no separate browser process. It does not open browsers, navigate, run headless, or fire real pointer events — it's the dev inner loop, not E2E. For that, use Playwright.
#Install
npm i aipeek
# or
pnpm add aipeek
# or
bun add aipeek#Setup
// vite.config.ts
import { aipeekPlugin } from 'aipeek'
export default defineConfig({
plugins: [aipeekPlugin()],
})#API Endpoints
All endpoints are available on your Vite dev server:
| Endpoint | Description |
|---|---|
GET /__aipeek/screen | State-machine projection — {view, modal, focus, knobs}. Start here. |
GET /__aipeek | Summary of all sections (UI, console, network, errors, state) |
GET /__aipeek/{section} | Detail for a section (ui, console, network, errors, state) |
GET /__aipeek/{section}/{index} | Detail for a specific item in a section |
GET /__aipeek/{section}?full | Full detail (no truncation) |
GET /__aipeek/dom[?scope=Name|?sel=css] | Semantic DOM — UI as text (see below) |
GET /__aipeek/query?sel=css | Read-side twin of sel=: a selector's live count + each match's text/visible/attrs (role, data-state, aria-*/data-*, value, disabled). Per-element assertions without /eval. |
GET /__aipeek/{action}?... | Drive the page (see Actions) |
GET /__aipeek/tabs | List live tabs (id, visible/background, title) for ?tab= addressing |
POST /__aipeek/chain | Run a JSON array of actions in one round-trip (see Actions) |
GET|POST /__aipeek/eval | Run arbitrary JS in the page (?code= or POST body); returns the result. Escape hatch for what typed endpoints can't do — for count/text/state/attr checks reach for /query first. |
#Perception layers — UI as text, not pixels
For a model, the UI's optimal representation is its semantics, not rendered pixels. A screenshot forces pixel→meaning re-derivation and costs hundreds of tokens; the same information is already textual in the DOM. aipeek exposes four layers, cheapest first:
/screen— state-machine projection. The whole UI collapsed to what a human reads off a washing-machine panel:view(which area),modal(is something covering it),focus, andknobs(the few reachable controls now — repeated rows fold tosource ×N, and when a modal is open only its subtree counts). A handful of lines. Start here./ui— React component tree. Full structure. Deep-dive when/screenisn't enough./dom— semantic DOM:tag·role·semantic-class·data-*·stateper element, with Tailwind/atomic noise stripped and each line tagged with its source location (@File.tsx:line, viacode-inspectorif present). This is what tells the model what an element is, its live state, and where to edit it./screenshot— pixels. Lossy DOM→PNG (html-to-image). Only for visual checks a human looks at; not the model's primary channel.
Scoped DOM — work top-down. The full DOM is huge; a scoped view is accurate. Read
/ui to find a component, then scope the DOM to it:
curl localhost:5195/__aipeek/ui # find <ChatInput>
curl 'localhost:5195/__aipeek/dom?scope=ChatInput' # just that subtree (matches source path)
curl 'localhost:5195/__aipeek/dom?sel=.chat-list' # or any CSS subtreescope= matches against the data-insp-path source path, so directory structure acts
as the component boundary. Each line's @File.tsx:line then tells you exactly where to edit.
#Actions — drive the current tab
| Endpoint | Params | Effect |
|---|---|---|
/click | sel= (CSS) or text= (visible text) | dispatch a real click |
/fill | sel=/text= + value= | set value on input/textarea/select via React's native value setter (fires onChange on controlled inputs); contenteditable via execCommand |
/press | key= (e.g. Enter, Control+a) | keydown/keyup on the focused element |
/wait | text=/sel=, timeout= (ms, default 5000) | poll until it appears; 504 on timeout |
/screenshot | sel=, out= | DOM→PNG into .aipeek/; skips cross-origin/broken images |
POST /chain | JSON array of {type, sel?, text?, value?, key?, timeout?} | run in sequence, settle between steps, stop on first failure |
click/fill/press settle the DOM and append --- changed --- — only the state-machine
transition this action caused (view: a → b, modal: opened X, focus: …) plus any new
errors/failed requests, not a fresh snapshot. (no state change) means nothing moved. You read
the delta and drill into /ui or /dom for detail only if you need it. On a target miss,
/click and /fill return the reachable clickable elements (clipped to the open modal's
subtree) so you can re-target.
They also append a --- recent actions --- timeline — the semantic page actions in order
(T=trusted human / S=synthetic aipeek), each with its resulting UI change, your own action
bracketed by 你当前的行为 dividers. If the user manipulates the page concurrently (closes a
dialog you opened), their action shows up in your next response — conflict surfaces automatically.
A CSS sel= with non-ASCII or quotes/brackets must be URL-encoded, or the query parser
mangles it: curl -G .../click --data-urlencode 'sel=button[title="知识库"]'.
Multiple tabs. Every read/drive command takes ?tab=<id> to address one tab — including
a background one (drive the Chat tab while the user reads a different tab; synthetic events
and Electron sendInputEvent don't need foreground). GET /tabs lists the live ids. One tab
open → omit ?tab=, it just works. Several open + no ?tab= → the command returns 409 + the
tab list instead of randomly hitting one; pick an id and retry with ?tab=.
Chain packs a whole interaction into one round-trip:
curl -X POST localhost:5195/__aipeek/chain -d '[
{"type":"click","sel":"button[title=\"知识库\"]"},
{"type":"wait","text":"Done"},
{"type":"fill","sel":"textarea","value":"hi"},
{"type":"press","key":"Enter"}
]'#CLI
npx aipeek # fetch from localhost:5195
npx aipeek --port=3000 # custom port#Store Registration (optional)
Register MobX/other stores for state inspection:
window.__AIPEEK_STORES__ = { myStore, anotherStore }State is snapshotted on demand (depth-limited, bounded) and included in the <state> section.