Cut the cost of long agent runs
Externalize agent state to disk so a slow or full worker respawns from a few KB, and keep planning on a separate cyclable chat.
The operating layer is Cursor. The orchestrator runs in a separate chat. This page is the reading version of the event slide deck.
The cost problem
A single long-lived agent re-sends its whole growing context every turn, so input tokens, which dominate the bill, climb worse than linearly as a session runs. Orchestration makes it worse, because planning and reconciliation are token-heavy reasoning that never touches code yet burns premium in-editor turns. Central Casting attacks both by putting the durable state on disk and moving the planning off the metered surface.
The .cca folder: state on disk
The durable truth lives in a .cca folder, one lane per work area. The chat is disposable, the folder is the source of truth.
A worker's state is its current_state.yaml plus its drafts/. Nothing irreplaceable lives in the chat. That single property is what makes a clean kill and respawn possible.
Hydration: the kill and respawn
The fresh worker reads a few KB of state, not the prior thread. No replay tax, no stale or cross-task context carried in, which is what "without leaking context" means in practice.
The external orchestrator, and cycling it
Planning, routing and state reconciliation run in a separate chat called O0, outside the editor. Because O0 treats the .cca workspace as the source of truth and reconciles from it every turn, the chat itself holds nothing irreplaceable. When a chat slows or fills, you start a fresh one, reload the portable instruction layer and reconcile from the workspace.
What the v5 O0 rules mean, in plain terms
The cycling works because of four rules the orchestrator follows. They are written tersely in the system, so here they are for a human reader.
- The workspace is the truth, not the chat. The
.ccabundle is the primary state surface. O0 reads it every turn and treats it as authoritative over anything it remembers. - Reconcile before acting. Before O0 interprets any worker return or operator request, it compares that input against the latest workspace state. If they disagree, it stops and writes a reconciliation surface, then proceeds once they agree.
- No stale memory. O0 refuses to rely on inferred continuity or prior-phase assumptions. This is the rule that makes the chat disposable, because nothing important is allowed to live only in the chat.
- The instruction layer is portable. The rules load into any fresh chat as a single block, so a new O0 is the same O0 after one hydration turn.
Put together, a slow or full O0 chat is not a loss. You open a new one, paste the instruction layer, point it at the workspace and it picks up exactly where the last one was.
What it costs, on a real program
These numbers come from one real five-week run of this system, read from its own step log.
Roughly 70 to 85 percent fewer input tokens on the metered surface.
Two levers drive it, and the durable claim is the token reduction, with dollars as an illustration.
| Lever | What changes | Estimated effect |
|---|---|---|
| External orchestrator | About 12 context-filling O0 chats ran on a flat-fee chat subscription, off the metered in-editor turns. A slowing chat was cycled in one hydration turn. | On the order of 12 to 24M input tokens moved off the metered surface, near $40 to $70 at $3 per 1M, replaced by about $20 a month flat. |
| Bounded workers and respawn | Each worker ran scoped to one task home, near 10 to 25K tokens of context, where one monolithic thread grows to 80 to 150K. | About 70 to 80 percent fewer input tokens per task, and a slow worker costs one hydration turn to replace. |
Estimates depend on the model, the pricing mode and prompt caching. Caching lowers the absolute dollars, and the structural win, bounded context plus respawn from disk, still cuts both cached and uncached load.
Demo walkthrough
Show the cost
One long agent re-bills a growing context every turn, and planning burns premium turns that never touch code.
Open the .cca folder
State is externalized to disk per lane and per worker, so the chat is disposable.
Kill and respawn
Checkpoint a worker, kill it, spawn a fresh one and watch it hydrate from a few KB of state.
Cycle the orchestrator
Replace a slow O0 chat with a fresh one that reconciles from the workspace in a single turn.
Show the number
The token reduction, anchored to the real five-week run above.
Cost deck: the slide deck · Method deck: the walkthrough · In practice: the aimez.ai program · Source: GitHub