Central Casting

Cost demo

Externalize agent state into a tiered orchestration

A central orchestrator over lane orchestrators over bounded workers, with the durable state on disk so any piece respawns from a few KB.

The operating layer is Cursor. The lower cost of a long run is what this structure buys.

The problem

One thread holds everything

A monolithic agent keeps its plan, its state and its work in a single growing thread, so no part can be split off, replaced or audited on its own.

That thread re-sends its whole context every turn, so the input tokens that dominate the bill climb worse than linearly as the run goes on.

The shape

State on disk, work split into lanes

Put the durable state on disk so the thread becomes disposable, then split the work into lanes that a tiered orchestration routes and reconciles.

state on disk + tiered orchestration + bounded workers

The architecture

Three tiers that branch

Orchestration is tiered, so no single agent carries the whole program.

  1. O0, the central orchestrator, routes work to lanes and reconciles their state every turn.
  2. O1 to O16, lane orchestrators, each own one work area and track its step state.
  3. A1 to An, worker actors, each scoped to a single task home inside a lane.

State on disk

The .cca folder

The durable truth lives in a .cca folder with one lane per work area, so the folder is the source of truth and the chat is disposable.

.cca/ O1 .. O16/ a lane, the local orchestrator's home current_step.yaml lane step state MMDD_task/ a dated task home A1 .. An/ a worker actor in the lane drafts/ work the worker produced current_state.yaml the worker's externalized state

Replace a worker

Hydration in five moves

worker slows checkpoint to disk kill fresh worker hydrate and resume

Because the worker's state is on disk, a fresh one reads a few KB and resumes, so it carries no stale or cross-task context and pays no replay tax.

Replace the orchestrator

Cycle a full O0 chat in one turn

O0 runs in its own chat and reconciles from the .cca workspace every turn, so it holds nothing irreplaceable and a fresh O0 picks up where it left off.

O0 chat fills fresh O0 loads the rules reconcile from .cca resume routing

Why cycling is safe

The four O0 rules

  1. The workspace is the truth, not the chat. O0 reads the .cca bundle every turn and treats it as authoritative.
  2. Reconcile before acting. It compares each input against the latest workspace state and stops on a conflict.
  3. No stale memory. It refuses inferred continuity, which is what makes the chat disposable.
  4. The instruction layer is portable. The rules load into any fresh chat, so a new O0 is the same O0 after one hydration turn.

On a real program verified

One five-week run, read from its own step log

7active lanes
14task homes
175checkpoints
~12O0 chats cycled (est)

The lane, task-home and checkpoint counts are read straight from the central step log. The cycled-chat count is an estimate.

What it buys estimate

Roughly 70 to 85 percent fewer input tokens

70 to 85 percent fewer input tokens on the metered surface.

An estimate from one real run. The structural cause is verified, durable state on disk, bounded per-task context and an orchestrator cycled in one hydration turn. Dollar figures are illustrative, not measured.

Two levers estimates labeled

Two levers, both structural

LeverWhat changesEstimated effect
External orchestratorPlanning runs on a flat-fee chat, off the metered in-editor turns, cycled in one hydration turn.token-heavy reasoning leaves the metered surface
Bounded workersEach worker is scoped to one task home, near 10 to 25K tokens, where one monolithic thread grows to 80 to 150K.about 70 to 80 percent fewer input tokens per task

Caching lowers the absolute dollars and the structural win holds either way.

Read more

Go deeper

The method, the worked program and the reading version of this demo.

1 / 12

Arrow keys or space to move