Gil Raitses
Durable behavior, learned by simulation
How I build the substrate that lets an agent practice inside a sandbox, records every move it makes and measures whether the behavior held.
Two threads of my work meet here. From the neuroscience side I study behavior as a dynamical system, how internal state shapes the next action and how a learned response persists or extinguishes. From the engineering side I build the sandboxes, simulation fleets and audit layers that let an agent practice and improve. The question underneath both is the same, what makes a behavior durable and what makes it brittle.
A sandbox substrate for practice
Central Casting runs agent work as bounded workers with their state externalized to disk. Every real change is a typed checkpoint, so a worker can be killed and respawned from a few KB and the whole run replays as a record. Over one five-week run it logged 175 checkpoints across 7 lanes and 14 task homes, with 30 pre-write inventories and 16 logged self-corrections.
Where it maps. Deterministic resets, captured state and replay are what a simulation and reinforcement-learning loop needs to train an agent by doing. The same surface gives a clean improvement signal across attempts and a place where a failure surfaces as an event.
Simulation at scale
I ran a 135-condition molecular-dynamics campaign on an AWS EC2 fleet, owning it from provisioning through completion under a zero-warning acceptance gate, with status beacons and in-flight error recovery, then rebuilt the monitoring after a costly fleet loss.
Where it maps. Running many conditions in parallel, catching failures in flight and holding a hard acceptance gate is the daily operation of a training environment that has to stay correct while it stays cheap.
Behavior as the object of study
As a research associate in the NRT EmIRGE-Bio program at Syracuse I study how internal state drives the next action and how a learned response holds or fades. That is the durable-behavior question stated in its original form, before it is an agent question.
Where it maps. Enterprise customers want an agent that holds a consistent, correct behavioral profile under edge cases and regulatory pressure, the right personality for their domain, every time. That is durability of behavior, measured.
Applied to training agents in simulation
Put together, here is how I would build a loop that trains durable behavior and proves it.
- Run the agent in a bounded sandbox where it can practice and fail safely, with state captured at every step.
- Reset deterministically and replay, so each attempt is comparable and the improvement signal stays clean.
- Score behavior against an explicit standard and validate against an independent estimate before the result is trusted.
- Feed corrections back as superseding records, so a regression is caught once and not repeated.
I ship the backend this runs on too, a Python and FastAPI service live on AWS behind API Gateway, with container-image deploys, monitoring and cost tuning.