Simulation

Package: packages/simulator Depends on: packages/agent-runtime, packages/execution-engine, packages/brokers/paper, packages/db

What simulation does

Replays a historical date range, tick by tick, calling the agent with the data it would have seen at each point in simulated time. Routes its proposed actions through the same Execution Engine the live runner uses, against a deterministic paper broker. Produces a complete record of every decision plus equity/PnL curves.

Sim is the platform's iteration loop. If sim is slow or unfaithful, authors won't trust their backtests, and the product loses its core value.

Design principles

Same code path as live. runSkill is the same function. The Execution Engine is the same module. Only the broker adapter and the data source differ. A bug in the runtime is caught in both.
No look-ahead. At simulated time T, the agent's context contains only bars and news with timestamps ≤ T. Enforced in the data layer, not by convention.
Deterministic. Same Skill + same date range + same model output = same sim result. Model output is the only stochastic input; everything else is reproducible.
Snapshots are the artifact. Every tick writes a full Decision Snapshot, identical in shape to live. Reports are computed from snapshots, not separately.

Sim run lifecycle

queued → loading → running → computing_metrics → complete
                            ↘
                              error
                            ↘
                              cancelled

A sim_runs row tracks status, progress, and parameters:

sim_runs (
  id            uuid pk,
  skill_id      uuid,
  skill_version int,
  user_id       uuid,
  symbol        text,             -- MVP: one symbol per run
  bars_interval text,             -- '1m' | '5m' | '15m' | ...
  range_from    timestamptz,
  range_to      timestamptz,
  status        text,
  progress_pct  int,
  error_text    text null,
  metrics_json  jsonb null,       -- computed at end
  cost_usd      numeric null,
  started_at    timestamptz null,
  finished_at   timestamptz null,
  created_at    timestamptz
)

The replay loop

// pseudocode
export async function runSim(simRunId: string) {
  const sim = await db.getSimRun(simRunId);
  const skill = await db.getSkill(sim.skill_id, sim.skill_version);
  const broker = makePaperBroker({
    initialEquityUsd: 10_000,
    fees: { makerBps: 1.5, takerBps: 4.5 },
    slippageModel: 'bps_per_million',
  });

  await db.markSimRunning(simRunId);

  const ticks = iterateTicks({
    symbol: sim.symbol,
    interval: sim.bars_interval,
    from: sim.range_from,
    to: sim.range_to,
    schedule: skill.schedule,
  });

  let processed = 0, total = await countTicks(...);

  for await (const tickTime of ticks) {
    // 1. Build per-tick ToolContext with time-bounded data clients
    const ctx = makeSimContext({
      skill,
      deploymentId: simRunId,           // sim runs use sim_run_id as deployment id
      asOf: tickTime,                   // <-- enforces no-look-ahead
      broker,
    });

    // 2. Snapshot portfolio before
    const portfolioBefore = await broker.snapshot();

    // 3. Run the agent
    const decision = await runSkill({ skill, ctx });

    // 4. Process through Execution Engine
    const result = await engine.process({
      skill,
      deploymentId: simRunId,
      proposedAction: decision.proposedAction,
      broker,
      portfolioSnapshot: portfolioBefore,
    });

    // 5. Advance broker to next tick (apply pending fills, funding, etc.)
    await broker.advance(tickTime);

    // 6. Persist snapshot
    await db.insertDecisionSnapshot({
      sim_run_id: simRunId,
      tick_at: tickTime,
      context_json: ctx.contextDump(),
      steps_json: decision.steps,
      final_text: decision.text,
      proposed_action: decision.proposedAction,
      engine_result: result,
      cost_usd: decision.costUsd,
    });

    processed++;
    if (processed % 50 === 0) {
      await db.updateSimProgress(simRunId, processed / total);
    }
  }

  // 7. Compute summary metrics from snapshots + broker history
  const metrics = await computeMetrics(simRunId, broker.history());
  await db.markSimComplete(simRunId, metrics);
}

Time-bounded data context

The asOf parameter is the linchpin. Every data client passed into ctx is bound:

const barsClient = {
  recent: (symbol, lookback) =>
    db.query(`
      SELECT * FROM bars
      WHERE symbol = $1 AND ts <= $2 AND interval = $3
      ORDER BY ts DESC LIMIT $4
    `, [symbol, asOf, ctx.barsInterval, lookback]),
};

const newsClient = {
  search: (query, hours) =>
    db.query(`
      SELECT * FROM news
      WHERE ts <= $1 AND ts >= $1 - $2 * interval '1 hour'
      ORDER BY ts DESC
    `, [asOf, hours]),
};

No code path in the simulator can fetch data with ts > asOf. This is enforced by the type signatures of sim data clients (different from live clients which take "now").

Paper broker behavior

Documented in detail in execution-engine.md. Summary:

Market orders fill at the next bar's open (one-bar latency, like the real world)
Limit orders fill if a subsequent bar trades through the limit price
Fees applied per side (configurable)
Slippage applied as bps × (notional / $1M) (configurable)
Funding payments applied at funding-rate intervals based on historical funding data
Liquidation simulated: if maintenance margin breached, position auto-closed with penalty

Metrics

Computed at sim completion and stored in sim_runs.metrics_json:

type SimMetrics = {
  // Returns
  totalReturnPct: number;
  cagr: number;
  // Risk
  sharpe: number;
  sortino: number;
  maxDrawdownPct: number;
  maxDrawdownDurationDays: number;
  // Trading
  totalTrades: number;
  winRate: number;
  profitFactor: number;
  avgWinUsd: number;
  avgLossUsd: number;
  // Costs
  totalFeesUsd: number;
  totalFundingUsd: number;
  totalSlippageUsd: number;
  // AI cost
  totalAiCostUsd: number;
  costPerDecisionUsd: number;
  // Decisions
  totalTicks: number;
  totalProposedActions: number;
  totalAcceptedActions: number;
  rejectionsByRule: Record<string, number>;
};

Where sims run

All sims run on the sim-worker pool (apps/sim-worker on Fly.io), not on Vercel. Real-model sims run ~17s/tick; a day at 5m is 288 ticks ≈ 80 minutes, which blows past Vercel's 800s function ceiling. The worker has no per-job time limit.

The pool is dynamic and scales to zero (see ADR-0023):

Enqueue inserts a sim_runs row with status='queued' and calls ensureSimWorkerCapacity() (apps/web/lib/sim-capacity.ts), which provisions Fly machines on demand via the Machines API (apps/web/lib/sim-fly.ts).
Capacity targets min(SIM_WORKER_MAX_MACHINES, queued + running) live machines, so up to N backtests drain in parallel. The atomic CAS claim (status='queued' predicate on the UPDATE) keeps concurrent workers from double-claiming the same row.
Scale-down is each worker's own doing: after the queue is idle past SIM_WORKER_IDLE_EXIT_MS it exits 0, and Fly's auto_destroy removes the machine — no idle billing, no stopped-machine sweep.
Liveness is heartbeat-based: each worker stamps sim_runs.heartbeat_at (~every SIM_WORKER_HEARTBEAT_MS) on the row it owns. A non-terminal run whose heartbeat goes stale (SIM_WORKER_STALE_MS, default 3 min) is swept to error — run on every enqueue and every worker boot. This replaced the old boot-time "error every running row" orphan sweep, which was unsafe once more than one worker can run at a time.

The image each machine boots is pinned in platform_config.sim_worker_image (upserted by the sim-worker-deploy CI workflow), so a deploy never rolls a machine mid-backtest. In-flight runs finish on their old image; the next backtest starts on the new one.

From the user's perspective it's just "Backtest" — they don't pick or see infrastructure.

Parallelism within a sim

Initially: none. One agent at a time per sim. Reasons:

Agent decisions can depend on the previous tick's portfolio state — parallelism breaks causality
Determinism is easier
Cost is dominated by the model call, not the loop overhead

Phase 4 (after MVP), we'll add multi-sim parallelism (e.g. parameter sweeps that run N sims of slightly different Skill variants concurrently), but not parallelism within a single sim.

Cancellation

Users can cancel a running sim from the UI. Mechanism:

UI POSTs /api/sim/<id>/cancel
Server updates sim_runs.status = cancel_requested
Sim loop checks status every N ticks, exits cleanly if cancel requested
Partial results stay in the DB (snapshots already written), status = cancelled

Cost controls

Sims can rack up real $ in model calls. Two mitigations:

Pre-run cost estimate. Before queuing, show the user estimated cost = ticks × avg_cost_per_decision. Require confirm for sims > $5.
Per-user daily budget. Enforced server-side. Free tier gets $5/day of sim spend; paid plans get more.

Open questions

Replay news arrival latency. A news item with ts = T may not have been available to a real trader at exactly T — there's typically 30s–5min lag. Should sim model this? Currently: include news immediately; revisit in Phase 4.
Slippage model fidelity. Linear bps/million is coarse. Better: replay actual order book snapshots. Cost prohibitive for MVP; document and revisit.
Multi-symbol sims. Skill schema supports symbols: string[] but the simulator currently iterates ticks for one symbol. Multi-symbol = align timestamps across symbols. Tractable, not in MVP.

Simulation

On this page