aipeek

Gives AI a peek into — and a hand on — your running browser app. Reads the UI tree (React fiber), semantic DOM, console, network, errors, and store state; drives the page (click/fill/press/wait/screenshot). All over plain-text HTTP on your Vite dev server — zero resident context cost, unlike a browser MCP whose tool schemas sit in the model's context whether used or not.

10× faster end-to-end. What you feel is wall-clock from prompt to done — model thinking, round-trips, all of it. Screenshot agents (Playwright + vision) pay 2–5s of pixel-parsing every step; aipeek reads semantic text (instant) and batches a whole interaction into one round-trip with /chain — the model thinks once, not N times.

It lives inside the open page (injected client + HMR channel), so it reads React/store internals a DOM-only driver can't, and acts on the current tab with no separate browser process. It does not open browsers, navigate, run headless, or fire real pointer events — it's the dev inner loop, not E2E. For that, use Playwright.

#Install

npm i aipeek
# or
pnpm add aipeek
# or
bun add aipeek

#Setup

// vite.config.ts
import { aipeekPlugin } from 'aipeek'

export default defineConfig({
  plugins: [aipeekPlugin()],
})

#API Endpoints

All endpoints are available on your Vite dev server:

Endpoint	Description
`GET /__aipeek/screen`	State-machine projection — `{view, modal, focus, knobs}`. Start here.
`GET /__aipeek`	Summary of all sections (UI, console, network, errors, state)
`GET /__aipeek/{section}`	Detail for a section (`ui`, `console`, `network`, `errors`, `state`)
`GET /__aipeek/{section}/{index}`	Detail for a specific item in a section
`GET /__aipeek/{section}?full`	Full detail (no truncation)
`GET /__aipeek/dom[?scope=Name\|?sel=css]`	Semantic DOM — UI as text (see below)
`GET /__aipeek/query?sel=css`	Read-side twin of `sel=`: a selector's live `count` + each match's `text`/`visible`/`attrs` (role, `data-state`, `aria-`/`data-`, value, disabled). Per-element assertions without `/eval`.
`GET /__aipeek/{action}?...`	Drive the page (see Actions)
`GET /__aipeek/tabs`	List live tabs (id, visible/background, title) for `?tab=` addressing
`POST /__aipeek/chain`	Run a JSON array of actions in one round-trip (see Actions)
`GET\|POST /__aipeek/eval`	Run arbitrary JS in the page (`?code=` or POST body); returns the result. Escape hatch for what typed endpoints can't do — for count/text/state/attr checks reach for `/query` first.

#Perception layers — UI as text, not pixels

For a model, the UI's optimal representation is its semantics, not rendered pixels. A screenshot forces pixel→meaning re-derivation and costs hundreds of tokens; the same information is already textual in the DOM. aipeek exposes four layers, cheapest first:

/screen — state-machine projection. The whole UI collapsed to what a human reads off a washing-machine panel: view (which area), modal (is something covering it), focus, and knobs (the few reachable controls now — repeated rows fold to source ×N, and when a modal is open only its subtree counts). A handful of lines. Start here.
/ui — React component tree. Full structure. Deep-dive when /screen isn't enough.
/dom — semantic DOM: tag·role·semantic-class·data-*·state per element, with Tailwind/atomic noise stripped and each line tagged with its source location (@File.tsx:line, via code-inspector if present). This is what tells the model what an element is, its live state, and where to edit it.
/screenshot — pixels. Lossy DOM→PNG (html-to-image). Only for visual checks a human looks at; not the model's primary channel.

Scoped DOM — work top-down. The full DOM is huge; a scoped view is accurate. Read /ui to find a component, then scope the DOM to it:

curl localhost:5195/__aipeek/ui                  # find <ChatInput>
curl 'localhost:5195/__aipeek/dom?scope=ChatInput' # just that subtree (matches source path)
curl 'localhost:5195/__aipeek/dom?sel=.chat-list'  # or any CSS subtree

scope= matches against the data-insp-path source path, so directory structure acts as the component boundary. Each line's @File.tsx:line then tells you exactly where to edit.

#Actions — drive the current tab

Endpoint	Params	Effect
`/click`	`sel=` (CSS) or `text=` (visible text)	dispatch a real click
`/fill`	`sel=`/`text=` + `value=`	set value on input/textarea/select via React's native value setter (fires onChange on controlled inputs); contenteditable via `execCommand`
`/press`	`key=` (e.g. `Enter`, `Control+a`)	keydown/keyup on the focused element
`/wait`	`text=`/`sel=`, `timeout=` (ms, default 5000)	poll until it appears; 504 on timeout
`/screenshot`	`sel=`, `out=`	DOM→PNG into `.aipeek/`; skips cross-origin/broken images
`POST /chain`	JSON array of `{type, sel?, text?, value?, key?, timeout?}`	run in sequence, settle between steps, stop on first failure

click/fill/press settle the DOM and append --- changed --- — only the state-machine transition this action caused (view: a → b, modal: opened X, focus: …) plus any new errors/failed requests, not a fresh snapshot. (no state change) means nothing moved. You read the delta and drill into /ui or /dom for detail only if you need it. On a target miss, /click and /fill return the reachable clickable elements (clipped to the open modal's subtree) so you can re-target.

They also append a --- recent actions --- timeline — the semantic page actions in order (T=trusted human / S=synthetic aipeek), each with its resulting UI change, your own action bracketed by 你当前的行为 dividers. If the user manipulates the page concurrently (closes a dialog you opened), their action shows up in your next response — conflict surfaces automatically.

A CSS sel= with non-ASCII or quotes/brackets must be URL-encoded, or the query parser mangles it: curl -G .../click --data-urlencode 'sel=button[title="知识库"]'.

Multiple tabs. Every read/drive command takes ?tab=<id> to address one tab — including a background one (drive the Chat tab while the user reads a different tab; synthetic events and Electron sendInputEvent don't need foreground). GET /tabs lists the live ids. One tab open → omit ?tab=, it just works. Several open + no ?tab= → the command returns 409 + the tab list instead of randomly hitting one; pick an id and retry with ?tab=.

Chain packs a whole interaction into one round-trip:

curl -X POST localhost:5195/__aipeek/chain -d '[
  {"type":"click","sel":"button[title=\"知识库\"]"},
  {"type":"wait","text":"Done"},
  {"type":"fill","sel":"textarea","value":"hi"},
  {"type":"press","key":"Enter"}
]'

#CLI

npx aipeek                  # fetch from localhost:5195
npx aipeek --port=3000      # custom port

#Store Registration (optional)

window.__AIPEEK_STORES__ = { myStore, anotherStore }

State is snapshotted on demand (depth-limited, bounded) and included in the <state> section.