MCP surface

Start, advance, and inspect chit runs from inside a chat, with one run id and a heartbeat on each long step

The MCP surface is the primary way to operate chit. It runs the same runtime as the CLI, exposed as tools a model can call from inside Claude Code, so you and the model drive and inspect a run from the chat instead of a second terminal.

The model is small and worth holding in your head:

A chit is a declared routine (a manifest). A run executes one chit.
A manifest's policy decides what a run is: one-shot (a single pass over the step DAG) or loop (a write-capable implementer and a read-only reviewer iterate until the reviewer says proceed, blocks, or the budget runs out).
A mode decides where the run lives: foreground (this chat supervises it, one unit per call) or background (a detached worker drives it to completion and it survives a reconnect).
A plan coordinates several loop runs sequentially, applying each converged step into an integration worktree before launching dependent steps.
A batch coordinates several background runs, one per git worktree.
Audit reads the receipts a run leaves behind.

For single runs, you hold one id: the run_id that chit_start returns. Plans and batches have their own handles (plan_id, batch_id) because they coordinate several runs.

Install the CLI, then register the MCP server with chit mcp:

bun install -g @chit-run/cli@latest
claude mcp add chit --scope local -- chit mcp

chit mcp runs the stdio MCP server from the installed chit binary (the same binary as chit run, chit audit). Upgrade with the same command, bun install -g @chit-run/cli@latest (during 0.x, bun update -g will not cross a minor). From a source checkout the equivalent is claude mcp add chit --scope local -- bun <repo>/apps/cli/src/cli/run.ts mcp.

Source: apps/cli/src/surfaces/mcp/ (server.ts registers the tools; engine.ts is the one-shot run engine; converge-engine.ts is the single-iteration loop driver; controller.ts resolves a run_id to its run).

Execution model

A run is a stepwise projection of a manifest DAG. chit, not the model, decides what is legal to run next.

chit owns the order. A step is ready only when every step in its manifest.dependencies is done. chit_next runs only ready steps. The model drives, but it cannot invent routing. There are no dynamic, model-decided handoffs; the static-DAG thesis holds.
chit_next advances one unit and returns. It never drains by surprise. For a one-shot run, one unit is the currently-ready wave (every step whose dependencies are met), or a single step_id if you name one. For a loop run, one unit is one implement-to-review iteration. To drive a foreground run to completion, call chit_next until it reports complete; to run unattended, start it with mode: "background".
A step runs exactly once. A step marks itself running before its first await, so a concurrent advance on the same step is rejected. On settle the record is terminal: done | failed | cancelled.
Completion is all-steps-done, not just the output step. An independent pending or failed branch keeps the run incomplete.
chit_next blocks until its unit settles. The heartbeat renders on that in-flight tool call, so a model turn is pinned for the unit's whole duration.
Sessions. per_scope participants, and every loop run, need a scope. chit_start refuses a per_scope or loop run started without one rather than silently running stateless. Within a run, a per_scope participant resumes its own session across steps.

What the ids mean

One public id, and it never shifts meaning.

run_id is the single-run handle. chit_start returns it; chit_next / chit_status / chit_trace / chit_cancel take it. For a foreground run it lives in this session's memory; for a background run it is the durable record's id, so it resolves after a reconnect.
plan_id and batch_id are coordinator handles. A plan or batch owns several task runs, each with its own run_id, but you drive the coordinator with its own id.
audit refs are receipt ids. A run records one per audited step or iteration; they appear in chit_trace and chit_status output and you open them with chit_audit_show. A run is never its own audit ref.
loop ids and job ids are internal. They key the loop log and the durable job record on disk. chit never asks you for one and never returns one as a handle. If you see an id in a control or status result, it is the run_id.

Foreground vs background

mode is a property of a run, not a different kind of run.

Foreground (default) means this chat supervises the run. You advance it one unit at a time with chit_next and watch each unit settle. A foreground run lives in memory: a new session starts with none, and an idle one is evicted after an hour. Use it when you want to review every unit.
Background means a detached worker drives the run to completion on its own while you keep working. Its state is durable (the job record, loop log, and audit transcripts are on disk), so you inspect or stop it from any later turn, and it survives an MCP reconnect. Use it when the task is scoped enough to run unattended. Poll with chit_status, stop with chit_cancel.

One-shot vs loop

policy is a property of the manifest, declared once, that decides what a run does.

One-shot (the default when a manifest declares no policy) is a single pass over the DAG. Each ready wave runs once; the run completes when every step is done. Pass inputs, not a task.
Loop is the converge pattern: an implementer step and a reviewer step run in turn, and chit reads the reviewer's verdict (proceed | revise | block) each round. proceed converges only when its verification also passed; a proceed whose verification failed, was blocked, or did not run stops as needs-decision for a human to judge, never as a clean converge. That verification is the reviewer's self-reported checks by default, or -- when the loop declares required_checks -- the commands chit runs itself, which are authoritative over the reviewer's word. block stops, an exhausted max_iterations budget stops as max-iterations. A loop run takes a task and a scope, not inputs. The bundled default loop is a write-capable Claude implementer and a read-only Codex reviewer; chit_start with a bare task uses it.

Start and advance a run

All results are JSON in a single text block. Most embed a run view: { run_id, mode, execution, ... , nextAction }, where mode is foreground | background, execution is one-shot | loop | job, and nextAction names the next tool to call.

chit_start

Open a run and return its run_id. Inputs:

task? - a slice to converge on with the built-in loop. A loop run requires it; omit it when manifest_path names a one-shot manifest.
manifest_path? - a manifest .json (absolute, or relative to cwd). Its policy decides one-shot vs loop. Omit to converge on task with the built-in loop. Mutually exclusive with recipe.
recipe? - a vetted config recipe id from chit_recipes. The recipe supplies the manifest. A converge recipe needs task and scope and may supply default max_iterations / call_timeout_ms; explicit values override those defaults. A one-shot recipe uses inputs and no task. Mutually exclusive with manifest_path.
mode - foreground (default) or background.
scope? - session scope id. Required for a loop run and for any per_scope manifest.
cwd? - repo / working dir (defaults to the server cwd; also where a loop log is written).
inputs - manifest inputs as string key/value pairs, default {} (one-shot runs).
audit - persist a full audit transcript (prompts/outputs/usage as blobs), default false since blobs can hold secrets. Background runs are always audited.
max_iterations - loop iteration budget, default 3 (loop runs only).
allow_unenforced_permissions - run even when a declared permission cannot be enforced (emits warnings), default false (such a manifest is otherwise refused).
required_checks? - verification commands chit runs itself after a proceed review (loop runs only), each { command, args?, name?, timeoutMs? }. They REPLACE the manifest's policy.requiredChecks for this run (never merge), so a default-loop task run gets real verification without a custom manifest. Rejected for a one-shot run. See Chit-executed verification.

Returns the run view with the initial ready set (one-shot) or the open loop (loop). Errors: unreadable or invalid manifest, a one-shot manifest given a task, a loop run with no scope, unknown agent, enforcement gap without the flag, loop-only knobs on a one-shot run, required_checks on a one-shot run.

Chit-executed verification

A loop's verification can be the reviewer's self-report (advisory) or commands chit runs itself (ground truth). Declare the latter and chit stops taking the reviewer's word on whether the work actually passes.

chit does not choose checks. You declare them; chit runs exactly those, nothing inferred.
Run as argv, no shell. Each check is { command, args?, name?, timeoutMs? }, spawned directly: a metacharacter in an arg is a literal argument, never interpreted. No env, cwd, or shell strings.
The gate (only after a proceed review; block/revise skip it). chit runs the declared checks in the run's cwd: all passed -> converged; any failed -> the loop revises, with the failing checks fed into the next iteration's prior_review; a check that could not run (blocked/timed out) with none failed -> needs-decision for a human.
The record is honest. Each iteration carries verification + verificationSource ("chit" or "reviewer"). When the source is chit, checks / verification / verificationSource are the authoritative signal -- not the reviewer's checksRun prose. chit_status and chit_trace surface them.

You can declare checks at several levels; the closest-declared wins, never merged (an explicit [] overrides a lower level away):

where	field	beats
`chit_start`	`required_checks`	the manifest
`chit_batch_start`, per task	`requiredChecks`	batch + manifest
`chit_batch_start`, top level	`required_checks`	the manifest
manifest `policy`	`requiredChecks`	(the fallback)

The field NAME follows each surface's convention: snake_case required_checks at an MCP tool's top level; camelCase requiredChecks on the manifest policy and on a batch task object (which is already camelCase, beside manifestPath). The shape is identical everywhere.

chit_next

Advance the run by one unit and return control; emits a heartbeat while the unit runs. Inputs: run_id, and step_id? (one-shot only: advance just that ready step instead of the whole ready wave).

One-shot: runs the ready wave (or the named step). Returns { ran[], ...run view }, each ran entry { step, durationMs, output } or { step, cancelled: true, durationMs }. When nothing is ready, the run is complete.
Loop: runs one implement-to-review iteration. Returns { iteration, verdict, decision, findingCount, checksRun, changedFiles, workspaceWarnings, usage?, auditRef?, stopStatus?, statusLine, ...run view }. statusLine is a compact one-line summary of the round this call ran -- iteration N · <outcome>[ · checks][ · stop], e.g. iteration 1 · proceed · 3/3 required checks passed · converged -- so an agent reading the call back recovers the outcome from the returned data even when no live heartbeat reached it. A set stopStatus means the loop also stopped this round. A cancelled iteration returns { cancelled: true, iteration, statusLine, ...run view } and records a clean cancelled stop, never a fake-successful round. A graceful manifest failure returns { failed: true, iteration, failure, statusLine, ...run view } and closes the loop blocked.

chit_next rejects a run that has already finished, and a loop that already has an iteration in flight (one advancer per loop; a foreground call and a background worker cannot advance the same loop at once).

Inspect and stop a run

chit_status

Run status. With a run_id: that run's status, and whether it is foreground (supervised by this session) or a durable background run. With no run_id: the operator overview of what is active in this server now plus a compact list of recently finished runs (newest first). Read-only: it never sweeps or touches the in-memory stores, so polling never keeps a run alive. Inputs: run_id?, recent_limit? (overview only, default 5; 0 for none). Active foreground state is per-session; background runs and recent history are durable across reconnect.

For a foreground loop run, the view is the audit surface you read between and during iterations, recomposed from the run's own state so it is there whether or not a live heartbeat arrived:

statusLine -- the compact summary of the last completed round, the same line chit_next returned for it. Absent until the first round completes.
activity -- present only while an iteration is in flight: { iteration, phase, elapsedMs, phaseElapsedMs, lastActivityAgeMs, statusLine }. phase is implementing | reviewing | running required checks | cancelling (absent in the brief spin-up before the first phase, where the line reads starting); elapsedMs is the run's wall time so far, phaseElapsedMs the current phase's; the nested statusLine is the live line iteration N · <phase> · <dur>. A settled or never-run loop reports no activity.
Terminal receipt -- once stopped, elapsedMs (total wall time) and stopReason (why it stopped), beside the terminal status (converged | blocked | max-iterations | needs-decision | cancelled).

lastActivityAgeMs is the age of the last activity mark -- an iteration start, a phase transition, or a cancel -- not a periodic beat. A foreground run has no worker heartbeat, so an age of minutes is healthy mid-phase: a single implement or review step legitimately runs that long. This is the opposite of the background job view's lastHeartbeatAgeMs, a ~10s periodic worker heartbeat where a stale age means the worker has died.

chit_status is a snapshot. When you want to wait for something to happen rather than poll, use chit_wait.

chit_wait

Block until a background run, plan, or batch reaches a meaningful state, then return the same view as chit_status / chit_plan_status / chit_batch_status plus a waitResult. This is the "tell me when it's done" tool: reach for it instead of polling status in a loop, and never read chit's state files to detect completion (they are private). Read-only: it never advances a plan or batch and never mutates a run; it watches the durable state and returns. Emits a heartbeat while waiting; press Esc to stop waiting (the work keeps running). Inputs: exactly one of run_id, plan_id, or batch_id; timeout_ms? (default 900000); cwd? for plan and batch.

With a run_id (background runs only): waits until the run is terminal (completed / failed / cancelled, or its worker died), so a crashed worker never hangs the wait. A foreground run is rejected (advance it with chit_next, which already blocks per unit).
With a plan_id: waits until the active step's job needs reconciliation, the plan is ready_for_apply, or the plan is terminal. It does not advance, reconcile, apply, or launch. For manual control, run chit_wait -> chit_plan_advance -> chit_plan_status until a step is ready to apply or the plan settles. For the streamlined loop, use chit_plan_drive.
With a batch_id: waits until chit_batch_advance would do real work (a task can launch, or a finished/stale job can reconcile) or the batch is fully terminal. It does not advance the batch (it is read-only; only chit_batch_advance mutates state). A blocking-until-fully-done wait would deadlock, since nothing would ever advance the batch. So the operator loop is: chit_wait → chit_batch_advance → chit_batch_status, repeated until the batch is terminal: ready_for_review when every task is clean, or needs_human when a task needs a decision (a needs_attention task) or is blocked.

waitResult is terminal | needs_advance | ready_for_apply | timeout. needs_advance means call the matching advance tool now, then wait again; ready_for_apply means a plan step converged and is waiting for the gated apply payload.

chit_trace

The history of a run. Input: run_id. For a one-shot run, the step transcript: { run_id, execution: "one-shot", complete, trace[] }, each entry { step, kind, participant, agent, status, durationMs, output, error }. For a loop or background run, the iteration log read from the durable loop log: { run_id, execution, records[] }, where records are the header, each iteration (summary, changed files, verdict, decision, usage, audit ref), and the stop record. Audit refs appear here; the run_id is the only handle. Read-only.

chit_cancel

Cancel a run by run_id, foreground or background. Input: run_id.

Foreground one-shot: aborts every running step; each settles cancelled (terminal, blocks dependents).
Foreground loop: if an iteration is in flight, aborts it (it settles as a clean cancelled stop); if the loop is open but idle, closes it cancelled.
Background: records the cancel intent first (so it survives a worker restart), then signals the worker's process group; the worker stops at the next safe point and records a clean cancelled stop.

A run that already finished is reported back unchanged.

chit_recipes

List the effective recipe menu for the target repo. Inputs: cwd?.

Returns { recipes, configPath?, repoConfigPath? }, where each recipe is the redacted launch surface: id, origin (global or repo), mode, manifestPath, and optional loop knobs (maxIterations, callTimeoutMs, description). It loads layered config fresh for the given cwd, using the same global plus repo rules as the launch tools.

This is read-only and redacted: it never returns manifest contents, participant instructions, prompts, env values, permissions, audit blobs, or model output. Use a returned recipe id with chit_start, chit_plan_start, or chit_batch_start.

Run a sequential plan

A plan runs several loop tasks in order. Each step gets its own worktree, converges, then pauses as review_ready. You decide whether to apply that step into the plan's integration worktree. Only after a step is applied can a dependent step launch, and it launches from the integration tip, so it can see the code its dependency produced. Use a plan when later work needs earlier work's diff. Use a batch when tasks are independent and can run in parallel.

chit_orchestrate

The single-call entrypoint from a goal to a reviewable plan. Hand it a goal and it produces the same artifact you would build by hand with the planner manifest and a chit_plan_start dry run: the normalized plan plus the approvalHash you echo back to launch. It composes existing primitives and nothing more.

What it does, in order: it runs the bundled planner manifest (examples/plan-author.json) through a read-only planning agent that inspects the repo and drafts a native sequential plan; it parses and validates that output into a plan; and it dry-runs the plan through the same chit_plan_start path, which resolves the base ref to a concrete commit, resolves every step's recipe and manifest binding, and computes the approval hash. It returns the normalized plan, the resolved base, the approvalHash, any resolved recipes and manifests, and nextSteps, the plain-language instructions for your next move.

It is static and approval-gated by design. It only ever previews: it never launches, never confirms a start, does not auto-approve, does not schedule, and adds no dynamic routing. It creates nothing -- no plan record, worktree, job, or branch. The deliverable is a plan and a hash; a human runs chit_plan_start with confirm:true to act on it.

Inputs: goal (required, what you want built in plain language); context? (operator notes the planner may use: a base branch, vetted recipe ids, or a vetted manifestPath override); base_branch? (the ref the integration branch is cut from, bound into the hash); max_iterations? (per-step iteration budget when a step declares none, also bound into the hash); cwd? (any path in the target repo the planner inspects, defaults to the server cwd).

To launch, review the returned plan, base, recipes, and manifest bindings, then call chit_plan_start with the SAME plan, confirm:true, and the shown approval_hash, repeating base_branch / max_iterations if you set them. After launch, call chit_plan_drive with the returned plan_id; it stops before each gated apply. Editing the plan, base, budget, a referenced manifest's content, or a selected recipe's definition changes the hash and the start is refused. This is the one-call shortcut for the manual flow the plan-author example walks through; reach for that flow when you want to read and adjust the plan JSON between the planner and the dry run.

chit_plan_start

Approve and start a plan. The start is gated: with confirm omitted or false it parses the plan, resolves the base to a concrete commit SHA, resolves every selected recipe or direct manifest reference to a manifest digest and participant summary, returns the normalized plan plus approval_hash, and creates nothing. With confirm:true, chit re-parses, re-resolves, recomputes the hash over the plan, resolved base, max_iterations, recipes, manifest digests, and participant summaries, and refuses unless approval_hash matches. On a match it launches the first runnable step from the approved commit SHA.

Inputs: plan (inline object or JSON string) or plan_path, cwd?, base_branch?, max_iterations?, confirm?, approval_hash?. A plan step may select a vetted recipe id instead of a direct manifestPath; the two are mutually exclusive.

chit_plan_list

List plans in this repo, newest first: plan_id, title, status, step count, and how many steps are applied / review_ready / needs_human / failed. Use it to recover a plan_id, then inspect with chit_plan_status. Read-only. Inputs: limit?, cwd?.

chit_plan_status

Read-only overview for one plan: integration branch/worktree, current status, every step's run state, changed files, checks, receipts, and next action. It never launches, reconciles, applies, or cleans up. Inputs: plan_id, cwd?.

chit_plan_drive

Drive an existing plan until the next operator gate. It waits while a step is live, advances when a finished, stale, or vanished job can reconcile, and launches a pending dependent when the plan can move forward. It returns the plan view plus driveResult and advances.

It never applies a review_ready step, confirms an apply, cleans up, cancels, or approves anything. More broadly, a driver agent calling this tool must never start or confirm a plan, approve any gate, merge branches, bypass approval hashes, or change recipes or manifests. The two human gates are plan start approval and each step apply; everything between is mechanical progression. Use chit_plan_drive after chit_plan_start, then again after you explicitly apply a review_ready step with chit_plan_advance.

Inputs: plan_id, cwd?, timeout_ms?, max_iterations?.

chit_plan_advance

The progression trigger. Without an apply payload, it reconciles the active step's finished job and launches the next runnable step if one is ready. With apply, it runs the gated apply for one review_ready step into the plan integration branch. Apply is dry-run by default; pass apply.confirm:true to commit the step into the integration branch. Inputs: plan_id, cwd?, max_iterations?, and optional apply: { step_id, confirm?, include_untracked? }.

chit_plan_cancel

Cancel the active step's job and mark pending steps cancelled. Worktrees stay in place for inspection; cleanup is separate. Inputs: plan_id, cwd?.

chit_plan_cleanup

Retire the plan-managed worktrees and branches. Safe by default: with confirm omitted or false it lists what would be removed and removes nothing. It keeps plan, job, loop, and audit receipts. Inputs: plan_id, cwd?, confirm?, cleanup_mode?.

cleanup_mode defaults to safe, which removes only completed or cancelled plans and refuses unresolved work. Use cleanup_mode: "discard_unresolved" only after inspecting the step worktrees and receipts for a paused, failed, ready-for-apply, or cancelled plan. It is still dry-run by default, still needs confirm:true to remove anything, still refuses live workers, and never applies or merges the discarded work.

Run several tasks in parallel

A batch runs several loop tasks in parallel, one per git worktree, as background runs. It is a thin coordinator: it plans a task graph, creates a worktree per task, and launches a background run per runnable task. It owns no execution and never auto-merges; the deliverable is a set of reviewable worktree branches. There is no daemon: progress happens only at explicit tool calls. Batch state is durable under the state dir, keyed by repo (not in the reviewed tree).

You hand chit a reviewed task graph; chit runs it. Each task declares claimedPaths (the files it will touch); tasks with overlapping claims never run concurrently (they serialize into later waves), so parallel tasks cannot race on the same files. A task may declare dependencies (task ids that must reach review_ready first). Dependencies are a launch gate, not integration: each task's worktree branches from the batch base (base_branch), so a dependent task starts from that base and does not receive its dependencies' changes. Use dependencies to order work; merging the resulting branches is yours. The manifest per task resolves as task recipe / manifestPath > batch recipe / manifest_path > the bundled default (a write-capable Claude implementer and a read-only Codex reviewer). recipe and direct manifest path are mutually exclusive at the task and batch levels. A batch can mix model pairs by selecting different recipes per task, or by pointing tasks at different manifests. The repo's examples/converge-codex-writer.json shows the direct-manifest shape (it is not shipped in the npm package, so copy or adapt it).

chit_batch_start

Approve and start a batch. The start is gated (see below): by default a call is a dry run that normalizes and validates the task graph but creates nothing, and only a confirmed call creates worktrees and launches the initial runnable wave. Inputs: tasks (each { id, title, body, dependencies?, claimedPaths?, allowPathOverlap?, recipe?, manifestPath?, requiredChecks? }), cwd?, max_parallel? (default 2), base_branch? (default HEAD), recipe?, manifest_path?, max_iterations?, required_checks? (batch-level, applied to any task without its own), call_timeout_ms?, plus confirm? and approval_hash? for the gate. claimedPaths is required per task unless allowPathOverlap is set (which makes the task run alone). A task's requiredChecks beat the batch-level required_checks, which beat the manifest's -- see Chit-executed verification.

The start is gated: dry-run -> review -> approval_hash -> confirm. Nothing launches until you approve a specific task graph, base, and set of knobs.

confirm omitted or false is a dry run. chit normalizes and validates the task graph and resolves base_branch to a concrete commit SHA, then returns the normalized tasks, the resolved base, and an approval_hash over them. It creates nothing: no batch record, worktree, job, or branch. Read the normalized graph and resolved base, and if they are what you want, pass the approval_hash straight back.
confirm: true re-plans and verifies the hash. chit re-normalizes the graph, re-resolves the base, resolves selected recipes and manifests again, and recomputes the hash over the task graph, the resolved base commit, the launch knobs (max_parallel, max_iterations, recipe, manifest_path, required_checks, call_timeout_ms), manifest digests, recipe receipts, and participant summaries. It refuses unless approval_hash matches, so a task, base, knob, recipe, manifest content, or participant config edited after the dry run cannot start on a stale hash. On a match it launches the initial runnable wave (no-dependency tasks, up to max_parallel) from the approved commit SHA, pinned even if the ref has since moved, and returns the batch_id and batch view.

chit_batch_list

List the batches in this repo, newest first: batch_id, status, task count, how many tasks are review_ready / needs_attention / failed, createdAt, and cleanedAt if it has been cleaned up. Use it to recover a batch_id you lost, then chit_batch_status for the full view. Read-only. Inputs: limit? (newest N), cwd?.

chit_batch_status

Read-only overview: each task's status, live run state/phase, branch/worktree path, changed files, audit refs, plus runnableCount and a nextAction. Inspection is safe: this never launches runs, creates worktrees, or mutates state. Inputs: batch_id, cwd?.

chit_batch_advance

The progression trigger. Reconciles finished runs into task state (converged -> review_ready; blocked / needs-decision / max-iterations -> needs_attention, i.e. the run completed but did not converge clean and a human decides; a vanished/stale job or a failed run -> failed; a dependent proceeds only past a review_ready task), then launches the next runnable wave. Call it when status reports a finished run or runnable tasks. Inputs: batch_id, cwd?.

chit_batch_cancel

Request cancellation of every active task run (intent-first, the same safety as chit_cancel) and mark pending tasks cancelled. Running runs settle cleanly in the background; worktrees are left in place for inspection. Inputs: batch_id, cwd?.

chit_batch_cleanup

Retire a batch's worktrees and branches once you are done reviewing them. Safe by default: with confirm omitted/false it is a DRY RUN that lists which worktrees/branches would be removed and which changed-file diffs that would discard, and removes nothing. With confirm: true it removes them (git worktree remove --force + branch -D). Refuses while any task is still running. Never deletes the batch / run / audit receipts - those stay as durable history (it records cleanedAt on the batch). Inputs: batch_id, confirm? (default false), cwd?.

Read receipts

The audit tools read the local transcripts that audited runs write: chit run --audit, an audited MCP run (chit_start audit: true), and every background run (always audited). Same reader as the CLI chit audit list/show. Read-only: a run with no run.completed event is reported incomplete with the reason from the timeline alone (an open call killed mid-flight, a failed step, or an abandoned run). Bodies are read only through blob refs a run's own events carry, never a caller-supplied path, so inspection can never serve an arbitrary file.

chit_audit_list

List audited runs, newest first. Input: limit?. Returns { runs[] }, each run { audit_ref, manifestId, surface, scope?, iteration?, startedAt?, status, stepCount, usage?, openCall? }, where audit_ref is the receipt handle you pass to chit_audit_show, status is the run.completed status or incomplete, and openCall (when present) names a step whose adapter call started but never completed (killed mid-call). (audit_ref is a receipt id, distinct from a control run_id: a loop run has one run_id but one audit_ref per iteration.)

chit_audit_show

Show one audited run as a receipt, by its audit_ref (from chit_trace's auditRefs or chit_audit_list, not a control run_id). Inputs: audit_ref, include_bodies (default false), verbose (default false). Returns { summary, incompleteReason?, participants?, timeline[], note? }: the summary above, the reason when incomplete, the participant config recorded at start, and the structured event timeline. Without verbose the timeline is a receipt (the raw per-call adapter events are hidden, and note says how many). Prompt/output/event bodies attach to their timeline entries (input/output/raw) only when include_bodies is true, since they can be large or hold secrets.

Observability (heartbeat)

While a call step runs, chit_next emits, every ~5s, both a progress notification (with progressToken) and a logging notification carrying the same latest-state text. Claude Code renders the latest heartbeat live in the collapsed tool call; the full transcript is chit_trace.

Those live notifications are best-effort UI. A client may render them, drop them, or never surface them in the calling model's transcript at all, so an agent must not treat them as a data source. The durable audit surface to rely on is the returned data: chit_next's statusLine, chit_status's statusLine and in-flight activity, and the full chit_trace history. Each is recomposed from the run's own state and is readable whether or not a heartbeat ever arrived.

There is no within-step streaming of the agent's output to the MCP client. The heartbeat is latest-state text, not a token stream, and chit_next returns only the unit's final result. On an audited run the adapter does capture the agent's live event stream as adapter.event records, but that feeds the audit log, not the MCP client.

Cancellation

Cancellation has two reachable paths. The portable one is chit_cancel: each in-flight unit owns an AbortController registered for the whole call; chit_cancel aborts it, both adapters (claude-cli, codex-exec) kill their child process on abort and reject, and the engine settles the unit cancelled. The second path is ambient: in Claude Code, pressing Esc during a blocking chit_next propagates request cancellation through the call's folded-in extra.signal, which aborts the same controller. A live probe confirmed a long codex step settling cancelled in ~5s on Esc, with no chit_cancel call. Esc behavior is client-specific, so chit_cancel stays the portable backup.

A cancelled loop iteration records a clean cancelled stop with no iteration record, so the loop log never carries a fake-successful round.

Limits

Foreground runs live in an in-memory store. A server restart or reconnect loses them. The store is idle-evicting: a run untouched for more than 1h is dropped on the next chit_start sweep, unless it still has work in flight (those are never evicted). A background run is durable and is not affected.
After a foreground run is evicted, chit_status / chit_trace no longer find that run_id. A background run's loop log and audit transcripts persist regardless.
inputs are string to string. file[] inputs are not expressible via MCP.
Concurrent per_scope steps would hit the session store's read-modify-write race, so keep same-scope steps serial.

Not supported yet

Client-facing output streaming (a live token stream to the MCP client). The heartbeat is enough for now. Live adapter event capture for the audit log is separate, and has shipped.

MCP surface

On this page