Simulation
Package:
packages/simulatorDepends on:packages/agent-runtime,packages/execution-engine,packages/brokers/paper,packages/db
What simulation does
Replays a historical date range, tick by tick, calling the agent with the data it would have seen at each point in simulated time. Routes its proposed actions through the same Execution Engine the live runner uses, against a deterministic paper broker. Produces a complete record of every decision plus equity/PnL curves.
Sim is the platform's iteration loop. If sim is slow or unfaithful, authors won't trust their backtests, and the product loses its core value.
Design principles
- Same code path as live.
runSkillis the same function. The Execution Engine is the same module. Only the broker adapter and the data source differ. A bug in the runtime is caught in both. - No look-ahead. At simulated time T, the agent's context contains only bars and news with timestamps ≤ T. Enforced in the data layer, not by convention.
- Deterministic. Same Skill + same date range + same model output = same sim result. Model output is the only stochastic input; everything else is reproducible.
- Snapshots are the artifact. Every tick writes a full Decision Snapshot, identical in shape to live. Reports are computed from snapshots, not separately.
Sim run lifecycle
queued → loading → running → computing_metrics → complete
↘
error
↘
cancelledA sim_runs row tracks status, progress, and parameters:
sim_runs (
id uuid pk,
skill_id uuid,
skill_version int,
user_id uuid,
symbol text, -- MVP: one symbol per run
bars_interval text, -- '1m' | '5m' | '15m' | ...
range_from timestamptz,
range_to timestamptz,
status text,
progress_pct int,
error_text text null,
metrics_json jsonb null, -- computed at end
cost_usd numeric null,
started_at timestamptz null,
finished_at timestamptz null,
created_at timestamptz
)The replay loop
// pseudocode
export async function runSim(simRunId: string) {
const sim = await db.getSimRun(simRunId);
const skill = await db.getSkill(sim.skill_id, sim.skill_version);
const broker = makePaperBroker({
initialEquityUsd: 10_000,
fees: { makerBps: 1.5, takerBps: 4.5 },
slippageModel: 'bps_per_million',
});
await db.markSimRunning(simRunId);
const ticks = iterateTicks({
symbol: sim.symbol,
interval: sim.bars_interval,
from: sim.range_from,
to: sim.range_to,
schedule: skill.schedule,
});
let processed = 0, total = await countTicks(...);
for await (const tickTime of ticks) {
// 1. Build per-tick ToolContext with time-bounded data clients
const ctx = makeSimContext({
skill,
deploymentId: simRunId, // sim runs use sim_run_id as deployment id
asOf: tickTime, // <-- enforces no-look-ahead
broker,
});
// 2. Snapshot portfolio before
const portfolioBefore = await broker.snapshot();
// 3. Run the agent
const decision = await runSkill({ skill, ctx });
// 4. Process through Execution Engine
const result = await engine.process({
skill,
deploymentId: simRunId,
proposedAction: decision.proposedAction,
broker,
portfolioSnapshot: portfolioBefore,
});
// 5. Advance broker to next tick (apply pending fills, funding, etc.)
await broker.advance(tickTime);
// 6. Persist snapshot
await db.insertDecisionSnapshot({
sim_run_id: simRunId,
tick_at: tickTime,
context_json: ctx.contextDump(),
steps_json: decision.steps,
final_text: decision.text,
proposed_action: decision.proposedAction,
engine_result: result,
cost_usd: decision.costUsd,
});
processed++;
if (processed % 50 === 0) {
await db.updateSimProgress(simRunId, processed / total);
}
}
// 7. Compute summary metrics from snapshots + broker history
const metrics = await computeMetrics(simRunId, broker.history());
await db.markSimComplete(simRunId, metrics);
}Time-bounded data context
The asOf parameter is the linchpin. Every data client passed into ctx is bound:
const barsClient = {
recent: (symbol, lookback) =>
db.query(`
SELECT * FROM bars
WHERE symbol = $1 AND ts <= $2 AND interval = $3
ORDER BY ts DESC LIMIT $4
`, [symbol, asOf, ctx.barsInterval, lookback]),
};
const newsClient = {
search: (query, hours) =>
db.query(`
SELECT * FROM news
WHERE ts <= $1 AND ts >= $1 - $2 * interval '1 hour'
ORDER BY ts DESC
`, [asOf, hours]),
};No code path in the simulator can fetch data with ts > asOf. This is enforced by the type signatures of sim data clients (different from live clients which take "now").
Paper broker behavior
Documented in detail in execution-engine.md. Summary:
- Market orders fill at the next bar's open (one-bar latency, like the real world)
- Limit orders fill if a subsequent bar trades through the limit price
- Fees applied per side (configurable)
- Slippage applied as
bps × (notional / $1M)(configurable) - Funding payments applied at funding-rate intervals based on historical funding data
- Liquidation simulated: if maintenance margin breached, position auto-closed with penalty
Metrics
Computed at sim completion and stored in sim_runs.metrics_json:
type SimMetrics = {
// Returns
totalReturnPct: number;
cagr: number;
// Risk
sharpe: number;
sortino: number;
maxDrawdownPct: number;
maxDrawdownDurationDays: number;
// Trading
totalTrades: number;
winRate: number;
profitFactor: number;
avgWinUsd: number;
avgLossUsd: number;
// Costs
totalFeesUsd: number;
totalFundingUsd: number;
totalSlippageUsd: number;
// AI cost
totalAiCostUsd: number;
costPerDecisionUsd: number;
// Decisions
totalTicks: number;
totalProposedActions: number;
totalAcceptedActions: number;
rejectionsByRule: Record<string, number>;
};Where sims run
All sims run on the sim-worker pool (apps/sim-worker on Fly.io), not on Vercel. Real-model sims run ~17s/tick; a day at 5m is 288 ticks ≈ 80 minutes, which blows past Vercel's 800s function ceiling. The worker has no per-job time limit.
The pool is dynamic and scales to zero (see ADR-0023):
- Enqueue inserts a
sim_runsrow withstatus='queued'and callsensureSimWorkerCapacity()(apps/web/lib/sim-capacity.ts), which provisions Fly machines on demand via the Machines API (apps/web/lib/sim-fly.ts). - Capacity targets
min(SIM_WORKER_MAX_MACHINES, queued + running)live machines, so up to N backtests drain in parallel. The atomic CAS claim (status='queued'predicate on the UPDATE) keeps concurrent workers from double-claiming the same row. - Scale-down is each worker's own doing: after the queue is idle past
SIM_WORKER_IDLE_EXIT_MSit exits 0, and Fly'sauto_destroyremoves the machine — no idle billing, no stopped-machine sweep. - Liveness is heartbeat-based: each worker stamps
sim_runs.heartbeat_at(~everySIM_WORKER_HEARTBEAT_MS) on the row it owns. A non-terminal run whose heartbeat goes stale (SIM_WORKER_STALE_MS, default 3 min) is swept toerror— run on every enqueue and every worker boot. This replaced the old boot-time "error every running row" orphan sweep, which was unsafe once more than one worker can run at a time.
The image each machine boots is pinned in platform_config.sim_worker_image (upserted by the sim-worker-deploy CI workflow), so a deploy never rolls a machine mid-backtest. In-flight runs finish on their old image; the next backtest starts on the new one.
From the user's perspective it's just "Backtest" — they don't pick or see infrastructure.
Parallelism within a sim
Initially: none. One agent at a time per sim. Reasons:
- Agent decisions can depend on the previous tick's portfolio state — parallelism breaks causality
- Determinism is easier
- Cost is dominated by the model call, not the loop overhead
Phase 4 (after MVP), we'll add multi-sim parallelism (e.g. parameter sweeps that run N sims of slightly different Skill variants concurrently), but not parallelism within a single sim.
Cancellation
Users can cancel a running sim from the UI. Mechanism:
- UI POSTs
/api/sim/<id>/cancel - Server updates
sim_runs.status = cancel_requested - Sim loop checks status every N ticks, exits cleanly if cancel requested
- Partial results stay in the DB (snapshots already written),
status = cancelled
Cost controls
Sims can rack up real $ in model calls. Two mitigations:
- Pre-run cost estimate. Before queuing, show the user
estimated cost = ticks × avg_cost_per_decision. Require confirm for sims > $5. - Per-user daily budget. Enforced server-side. Free tier gets $5/day of sim spend; paid plans get more.
Open questions
- Replay news arrival latency. A news item with
ts = Tmay not have been available to a real trader at exactlyT— there's typically 30s–5min lag. Should sim model this? Currently: include news immediately; revisit in Phase 4. - Slippage model fidelity. Linear
bps/millionis coarse. Better: replay actual order book snapshots. Cost prohibitive for MVP; document and revisit. - Multi-symbol sims. Skill schema supports
symbols: string[]but the simulator currently iterates ticks for one symbol. Multi-symbol = align timestamps across symbols. Tractable, not in MVP.