Live Runtime

App: apps/live-runner Hosted on: Fly.io Machines, one per active Deployment Depends on: packages/agent-runtime, packages/execution-engine, packages/brokers/hyperliquid-*, packages/db

Why a separate long-running service

Live agents need:

Persistent websocket connections to Hyperliquid (market data, fills)
State that survives between ticks (open positions, in-flight orders)
Schedule adherence (a 5m-cron Skill must tick within seconds of every 5min mark)
Two-way communication with the deployer (commands in, state out)

None of these fit Vercel Functions cleanly. Fly Machines do: one tiny VM per Skill, fast cold start (~1s), $0/mo when stopped, scales per-skill independently.

One Skill = one machine

For each deployments row with status IN ('running', 'paused', 'halted') there is exactly one Fly Machine. When the deployment stops, the machine is destroyed. When it's created, a machine is provisioned.

This isolation gives us:

A crash in one Skill cannot affect another
Per-Skill memory limits + CPU sizing
Per-Skill log streams (Fly Logs scoped by machine)
Trivial horizontal scale — provision N machines for N Skills

Process anatomy

When the machine boots, the runner reads DEPLOYMENT_ID from env and starts three concurrent loops:

┌─────────────────────────────────────────────────────────────┐
│ Live Runner Process                                         │
│                                                             │
│  ┌──────────────────┐   ┌──────────────────┐                │
│  │ Control Loop     │   │ Tick Loop        │                │
│  │ (Postgres LISTEN)│   │ (schedule-driven)│                │
│  │                  │   │                  │                │
│  │ • stop           │   │ 1. assemble ctx  │                │
│  │ • pause          │   │ 2. runSkill()    │                │
│  │ • resume         │   │ 3. engine.process│                │
│  │ • flatten        │   │ 4. write snapshot│                │
│  │ • kill           │   │ 5. write state   │                │
│  │ • snapshot       │   │                  │                │
│  └────────┬─────────┘   └────────┬─────────┘                │
│           │                      │                          │
│           └──────┐         ┌─────┘                          │
│                  ▼         ▼                                │
│          ┌─────────────────────────┐                        │
│          │ Shared state            │                        │
│          │  • paused: boolean      │                        │
│          │  • running: boolean     │                        │
│          │  • broker handle        │                        │
│          │  • last tick metadata   │                        │
│          └─────────────────────────┘                        │
│                                                             │
│  ┌──────────────────────────────────┐                       │
│  │ WS Subscription                  │                       │
│  │ • Hyperliquid market data        │                       │
│  │ • Fill notifications             │                       │
│  │ • Reconnect with backoff         │                       │
│  └──────────────────────────────────┘                       │
└─────────────────────────────────────────────────────────────┘

The three loops

1. Control loop — listens for deployer commands

async function controlLoop(deploymentId: string, state: SharedState) {
  const client = await db.dedicatedConnection();
  await client.query(`LISTEN agent_commands_${deploymentId}`);

  client.on('notification', async ({ payload }) => {
    const cmd = await db.getCommand(payload); // payload = command id
    await handleCommand(cmd, state);
    await db.markCommandAcked(cmd.id);
  });
}

async function handleCommand(cmd, state) {
  switch (cmd.kind) {
    case 'stop':     state.running = false; break;        // tick loop exits, process drains
    case 'pause':    state.paused = true; break;
    case 'resume':   state.paused = false; break;
    case 'flatten':  await flattenAll(state.broker); state.paused = true; break;
    case 'kill':     process.exit(1); // skip drain, immediate
    case 'snapshot': await writeStateSnapshot(state); break;
  }
}

LISTEN/NOTIFY is Postgres-native, sub-second latency, no extra infra. Supabase exposes the same channel via its Realtime websocket for the web client if needed later.

2. Tick loop — the trading loop

async function tickLoop(deploymentId: string, state: SharedState) {
  const skill = await db.getDeploymentSkill(deploymentId);
  const broker = await makeBroker(skill, deploymentId);
  state.broker = broker;

  while (state.running) {
    if (state.paused) { await sleep(1000); continue; }

    await waitForNextTick(skill.schedule);   // cron-aligned sleep

    const tickAt = new Date();
    try {
      const ctx = makeLiveContext({ deploymentId, skill, broker, asOf: tickAt });
      const portfolioBefore = await broker.snapshot();
      const decision = await runSkill({ skill, ctx });
      const result = await engine.process({
        skill, deploymentId,
        proposedAction: decision.proposedAction,
        broker, portfolioSnapshot: portfolioBefore,
      });

      await db.insertDecisionSnapshot({ deploymentId, tickAt, decision, result, ctx });
      await db.upsertAgentState({ deploymentId, ...await deriveState(broker, decision, result) });
    } catch (err) {
      await db.insertAgentLog({ deploymentId, level: 'error', event: 'tick_failed', err });
      // do not crash the loop on a single bad tick
    }
  }

  await drain(broker);  // wait for in-flight orders to settle
}

3. WS subscription (Hyperliquid mainnet broker)

Owned by the HyperliquidMainnetBroker itself, not the tick loop. The broker maintains four subscriptions on the master account, all multiplexed over a single WebSocketTransport from @nktkas/hyperliquid:

Channel	Purpose	Drives
`webData2(master)`	Full account-state push (positions, equity, margin, open orders, exchange-computed `liquidationPx`)	In-memory state mirror that `snapshot()` returns
`userFills(master)`	Per-fill notifications	Resolves pending `placeOrder` waiters; per-fill `agent_logs` row (`event=order_filled`)
`userEvents(master)`	Liquidation + funding events	Sets broker's `halted=true` on liquidation; logs `funding_paid` events
`allMids`	Per-symbol mark prices	Drives the per-symbol mark-staleness gate (N1)

Defaults (from Slice-B N1–N6):

Mark-staleness gate (N1): per-symbol; if the last allMids update for a symbol is older than 30 s, placeOrder rejects new opens for that symbol with a synthetic error. Close orders always allowed (a stale close is still safer than letting the position ride).
Market-order semantics (N2): Hyperliquid has no native market order. The broker sends a marketable limit at mark ± 1% with tif: "Ioc", generates a 16-byte cloid for idempotency, and waits up to 2 s for the matching userFills event before returning. Hard cap on the full call is 5 s; if no fill arrives, placeOrder returns { orderId, fill: null } and the next tick's snapshot reflects the eventual fill once WS delivers.
WS outage handling (N3): the broker logs warnings and reconnects (transport handles backoff); it does NOT auto-halt the deployment. The mark-staleness gate is the counterweight; the deployer can pause if they want.
Per-fill audit (N4): every order_placed, order_filled, order_cancelled, funding_paid, liquidation is written to agent_logs via a logger callback the runner injects into the broker.
Liquidation handling (N5): on userEvents.liquidation, broker flips halted=true. The tick loop reads the flag through state.halted (set by the broker factory wiring) and refuses new opens. Manual clear_halt required to resume.
Periodic REST drift guard (N6): every 5 min (configurable), the broker re-pulls clearinghouseState, openOrders, allMids and overwrites the WS-derived state. Hyperliquid is the source of truth; if REST and WS disagree, REST wins and a state_drift warning is logged.

State writes

Two tables receive updates from the runner:

agent_state — one row per deployment, upserted after every tick:

agent_state (
  deployment_id   uuid pk,
  status          text,            -- running | paused | halted | stopping
  equity_usd      numeric,
  positions       jsonb,           -- array of { symbol, size, entry, unrealizedPnl }
  open_orders     jsonb,
  last_tick_at    timestamptz,
  last_action     jsonb null,
  last_reasoning  text null,
  day_start_equity numeric,        -- for daily loss halt
  updated_at      timestamptz
)

agent_logs — append-only event log:

agent_logs (
  id             bigserial pk,
  deployment_id  uuid,
  ts             timestamptz,
  level          text,             -- info | warn | error
  event          text,             -- 'tick_started' | 'tick_failed' | 'order_filled' | ...
  data_json      jsonb,
  message        text null
)

The deployment detail page in the web app reads these on demand. No streaming required for MVP.

Deployment lifecycle

deployments.status flow:

  provisioning  ──►  running  ◄──►  paused
       │              │ ▲
       ▼              ▼ │
     error          halted ──► (deployer must clear)
                      │
                      ▼
                   stopping ──► stopped (machine destroyed)

Status	Machine state	Tick loop
`provisioning`	being created	not started yet
`running`	running	active
`paused`	running	sleeping
`halted`	running	sleeping (engine-imposed)
`stopping`	running	draining
`stopped`	destroyed	n/a
`error`	running or destroyed	crashed

Status transitions are written by the runner; the web app reads them. Never the other way around.

Wallet model (mainnet only)

Mainnet deployments need a signing identity. Per ADR-0016:

The user pairs their master wallet at /wallet by personal-signing a server-issued challenge. We store the address only; we never see the master's private key.
At deploy time the platform generates a fresh secp256k1 agent wallet, persists the private key in Supabase Vault, and asks the master to sign Hyperliquid's approveAgent EIP-712 payload. We submit the signed action to https://api.hyperliquid.xyz/exchange; on status: ok, the agent is authorized to trade on the master's account.
The agent address + its Vault key id are recorded in hyperliquid_agents. The matching deployments.hyperliquid_agent_id FK is enforced by a DB check constraint: paper deployments must have null, mainnet deployments must have non-null.
Approval is unbounded; revocation happens on-chain via removeAgent from the master. The agent row's revoked_at is set when we observe revocation; once set, the runner refuses to start.

The agent's private key is decrypted at runner boot via the get_hyperliquid_agent_secret(agent_id) SECURITY DEFINER RPC and held in process memory for the lifetime of the machine. Never written back, never logged.

Provisioning flow

User clicks Deploy on the skill detail page → /skills/[id]/deploy.
User picks paper or hyperliquid-mainnet (testnet removed — ADR-0015).
For mainnet only:
- Web checks DEPLOYMENTS_DISABLED and HYPERLIQUID_LIVE_ENABLED env flags; refuses if either gates closed.
- Pairing precheck: user must have at least one paired master wallet.
- createAgentApprovalChallenge generates the agent keypair, stores the privkey in Vault, returns the EIP-712 typed data.
- Wagmi prompts the master wallet to sign; signature is submitted via submitAgentApproval, which POSTs to Hyperliquid /exchange and on success inserts hyperliquid_agents.
createDeployment inserts the deployments row with status='provisioning'.

Web calls Fly Machines API:

POST /v1/apps/agentic-live-runner/machines
{
  "name": "dep-<first 8 chars of deployment id>",
  "region": "nrt",                       // FLY_LIVE_RUNNER_REGION
  "config": {
    "image": "<FLY_LIVE_RUNNER_IMAGE>",  // SHA-pinned per release
    "env": {
      "DEPLOYMENT_ID": "<uuid>",
      "HYPERLIQUID_LIVE_ENABLED": "true" // only for mainnet machines
    },
    "guest": { "cpu_kind": "shared", "cpus": 1, "memory_mb": 512 },
    "restart": { "policy": "on-failure" },
    "auto_destroy": false
  }
}

App-level secrets (SUPABASE_*, OPENROUTER_API_KEY, etc.) live on the Fly app via fly secrets set and are injected automatically.

Web updates the deployment row with fly_machine_id, fly_region, fly_image, started_at. The image SHA is captured at provisioning time so we can later detect when a deployment is running an older platform release — see "Image pinning & per-deployment upgrades" below.
Machine boots, runner reads DEPLOYMENT_ID, loads deployment + skill, instantiates the broker via makeBroker(deployment):
- Paper → in-memory PaperBroker.
- Mainnet → HyperliquidMainnetBroker(masterAddress, agentAddress, decryptedPrivKey). Refuses to construct if HYPERLIQUID_LIVE_ENABLED !== 'true'.
Runner marks deployments.status='running' and enters the tick loop.

Failure cases: Fly create throws → row is marked status='error' with error_text and the user sees it on the deployment detail page. Broker construction throws (mainnet kill switch flipped) → runner exits, machine restarts; if the flag stays off the deployment will sit at provisioning/error until cleared.

Which version deploys

Any saved version is deployable, not just the latest. createDeployment takes an optional version (validated against skill_versions); omitted → skill.latest_version. The skill detail page exposes a per-version Deploy button in the version history, and the chat deploy card passes ?version= / ?broker= to the deploy page, which the page now consumes (target version drives the backtest verdict, broker prefills the picker). The (skill_id, skill_version) FK guarantees the version exists.

Deployment uniqueness rules

Three rules, enforced in createDeployment and mirrored in the UI. "Active" = status in provisioning | running | paused | halted | stopping (ACTIVE_DEPLOYMENT_STATUSES in lib/deployments-query.ts).

One active deployment per (skill, broker). A redeploy replaces its predecessor — stopCollidingDeployments queues stop on the same (skill_id, broker_kind) before provisioning. So you can't run two paper deployments of one skill, nor two mainnet, but 1 paper + 1 mainnet of the same skill can run side by side (the collision is scoped to the broker).
At most ONE active mainnet deployment across all skills (MVP). When deploying mainnet, createDeployment refuses if a different skill already has a live mainnet (getActiveMainnetDeployment, compared by skill_id). Redeploying the same skill's mainnet is allowed — rule 1 replaces it. The deploy form locks the mainnet path (paper stays available) and shows "you can deploy only one mainnet deployment."
Paper has no global cap. Any number of paper deployments across different skills can run at once (subject to rule 1 per skill).

To change a live mainnet deployment, stop it first, then deploy. Keeping a live deployment on the current platform release is an in-place image upgrade (upgradeDeploymentImage), not a new deployment, so it's unaffected by these rules.

Atomic enforcement. The pure predicates live in lib/deploy-rules.ts (mainnetCapBlocks, sameSkillBrokerCollisions, unit-tested) and are shared by createDeployment and the deploy page. Because the server action is check-then-insert, two partial unique indexes (migration 20260611120000) are the atomic backstop so a race can't break the rules:

deployments_one_active_mainnet_per_user — unique (user_id) where broker_kind='hyperliquid-mainnet' and status in (occupying).
deployments_one_active_per_skill_broker — unique (user_id, skill_id, broker_kind) where status in (occupying).

"Occupying" excludes stopping: a redeploy flips its predecessor to stopping before inserting the replacement (stopCollidingDeployments), so the slot reads free and the index doesn't reject a legitimate redeploy. A 23505 from these indexes is translated back into the same rule message.

The single-mainnet cap is a deliberate MVP guardrail. Expanding to multiple concurrent mainnet deployments (per-skill margin isolation, aggregate exposure accounting across the shared Hyperliquid cross-margin pool, rate-limit budgeting) is future work — revisit before lifting it.

Image pinning & per-deployment upgrades

Each Fly machine is pinned to a specific image SHA at creation time. Pushing a new platform release does not roll existing machines — by design, because rolling a runner mid-tick interrupts the trading loop and can leave in-flight Hyperliquid orders in an unknown state.

This means every running deployment can be on a different release. We track the SHA per-deployment so the UI can surface a divergence to the user.

What ships on a release

The .github/workflows/live-runner-deploy.yml workflow runs on every push to main that touches apps/live-runner/** or any package the runner depends on:

Build + push the live-runner image to Fly's registry via flyctl deploy --build-only --push. Capture the SHA-tagged image (e.g. registry.fly.io/agentic-live-runner:deployment-XXXX).
PATCH FLY_LIVE_RUNNER_IMAGE on the Vercel web project via the Vercel REST API. New deployments created from the dashboard after this point provision onto the latest image.
Existing machines are deliberately left alone.

The web app's currentLiveRunnerImage() helper (apps/web/lib/fly.ts) reads the same env var; the runner doesn't need it (the runner image already is the pinned image).

Surfacing "New agent version available"

On the deployment detail page (/deployments/[id]), server-side rendering compares deployments.fly_image to currentLiveRunnerImage(). If they differ AND the deployment is in an active status (running / paused / halted / error), the page renders a VersionBanner with an "Upgrade now" button. The deployments list (/deployments) shows a small update pill on the same rows.

Rows with fly_image IS NULL (created before migration 0009) are treated as "unknown" and never badged — avoids false positives on legacy deployments.

The upgrade action

upgradeDeploymentImage({ deploymentId }) server action does an in-place machine update:

Verify the deployment is owned by the caller and is in an upgradable status.
POST /apps/<app>/machines/<id> with { config: { image: <new SHA> } } — Fly preserves env, guest, restart policy, and the machine id.
POST /apps/<app>/machines/<id>/restart so the new image actually runs.
Persist the new SHA to deployments.fly_image so the banner clears immediately.

Preserved across the upgrade:

Fly machine id → all the per-deployment infrastructure (Postgres LISTEN channel name, agent_state row, decision_snapshots, agent_logs).
The Hyperliquid agent's Vault key id → no re-approval flow needed.
Open Hyperliquid positions and orders — they live on the exchange, not in our machine.

What the user sees: the tick loop pauses for ~10–30 s during the restart, the broker reconciles from the exchange on boot (open orders, positions, equity), and ticking resumes. No removeAgent/re-approve flow.

Why not auto-roll?

Two reasons:

Mid-trade interruption. A Hyperliquid order that lands mid-restart can fill on either side of the restart and the reconcile path has to be the one that catches up — fine when the user opts in at a quiet moment, risky as a platform-wide event.
Behavioral drift surprises users. A skill that's been profitably trading on release N suddenly behaving differently on N+1 is exactly the bug report we don't want. Upgrade is a deliberate act, not a Tuesday morning surprise.

The cost is that platform releases don't propagate until each user clicks Upgrade. That's the right trade for a trading product.

Stopping flow

User clicks Stop on the deployment detail page
Web server inserts agent_commands row with kind='stop', sends NOTIFY
Runner control loop receives notify, sets state.running = false
Tick loop exits after current tick, drains broker (waits for in-flight orders)
Runner marks deployments.status='stopped', exits process
Fly machine stops; a background sweep destroys stopped machines after 5 min idle

Crash and recovery

If the runner crashes mid-tick:

The in-flight tick is not retried (decisions should not be replayed against stale context)
The next scheduled tick proceeds normally
The crashed process exits; Fly restarts the machine
A new runner reads the same DEPLOYMENT_ID, reconnects WS, resumes ticking
The agent_logs table will show the crash + restart

If the runner is killed mid-order-submission:

Hyperliquid order may or may not have landed (idempotency via client order ID)
On restart, runner reconciles: fetch open orders from exchange, compare to in-flight DB state, reconcile

Resource sizing

Profile	CPUs	Memory	Use case
`shared-1x-512`	1	512MB	Default — most Skills
`shared-1x-1g`	1	1GB	Skills with large context (heavy news lookback)
`dedicated-1x-2g`	1	2GB	Phase 4: Skills with local model inference

Memory is the limit, not CPU. Most of the runner's time is waiting on I/O (model calls, WS, DB).

Cost

Fly Machines bill per-second when running. For a Skill ticking every 5 min:

~$0.01/hour for shared-1x-512 → ~$7/mo continuous
AI cost dominates: 12 ticks/hour × $0.01/tick (Sonnet, modest context) = ~$90/mo

Authors can pause expensive Skills to save spend.

Observability

Per deployment, the web UI at /deployments/[id] shows:

Status pill + halt / error banners (from deployments)
Current portfolio: equity, day PnL, positions, open orders (from agent_state)
Actions panel: pause / resume / clear_halt / flatten / stop / kill / snapshot. Each button is gated on the current status and routed through sendDeploymentCommand, which inserts an agent_commands row and lets the runner's LISTEN/NOTIFY loop do the work. kill is the exception — it also destroys the Fly machine via the Machines API immediately, because by definition the runner isn't responding.
For mainnet: the agent address with a "Revoke agent (platform-side)" button. Platform-side revocation queues a stop and sets hyperliquid_agents.revoked_at; full on-chain revocation requires the user to remove the agent from Hyperliquid's UI (the SDK doesn't expose removeAgent). The UI is explicit about the two-step model — see ADR-0016.
Recent decisions (50, from decision_snapshots) — verdict badge + click-to-expand for final_text and engine_result
Recent logs (50, from agent_logs) — filterable by level
Recent commands + ack status (20, from agent_commands)

Updates are not streamed in MVP — the page revalidates after each command via router.refresh(). Supabase Realtime streaming is Slice D scope.

Phase 3 adds Sentry for unexpected errors and a cost dashboard.

Ops-hardening (Slice D)

Four pieces of post-deployment safety live outside the per-deployment runner:

Heartbeat sweeper

Vercel cron at /api/jobs/heartbeat-sweep runs every 5 minutes. For each deployment with status IN ('running', 'paused'), joins to agent_state.updated_at and checks against a per-deployment threshold of max(3 × tick_interval, 15 min). Over-threshold rows are flipped to status='error' with error_text='heartbeat lost: …' and an agent_logs row is appended.

We deliberately do NOT auto-destroy the Fly machine — too easy to misfire on a brief lag and orphan an in-flight order. The user kills the machine manually via the deployment detail page.

Per-deployment cost guard

After every tick, the runner computes cost_usd from token counts × the rate table in @repo/shared's MODEL_RATES, writes it onto the decision_snapshots row, and sums month-to-date cost_usd for the deployment. If the sum exceeds skill.risk.maxMonthlyCostUsd (default $200), the runner queues a pause command via agent_commands with a cost_guard_tripped reason. The UTC month boundary is the natural reset point — no per-month counter table.

The user can resume after raising the cap on a new Skill version (Skill versions are immutable, so the resume implicitly requires a new version with the higher cap).

`mainnet_orders` audit table

For mainnet deployments, every order's lifecycle is recorded in mainnet_orders keyed by cloid:

placed → filled | partially_filled | cancelled | rejected

Wired via the broker's logger callback (Slice B). The broker emits order_placed / order_filled / order_cancelled / liquidation_fill events; the runner's routeOrderEvent in apps/live-runner/src/broker.ts upserts a row per event. Partial fills accumulate filled_size_base + recompute weighted avg_fill_price; full coverage flips status='filled'. Paper deployments don't touch this table.

This is the structured audit story when the user (or compliance) asks "where did this $X go on Hyperliquid?".

Pending-agent staging GC

pending_hyperliquid_agents table stages the (vault_key_id, master_wallet_id, agent_address, agent_name, nonce_ms) tuple between createAgentApprovalChallenge and submitAgentApproval. Replaces the Slice-A in-memory Map so the wallet-pairing flow survives across Vercel Fluid instances.

A daily cron at /api/jobs/pending-agents-gc deletes rows older than 10 minutes. The corresponding Vault secret stays around as an orphan — accepted leak for MVP; future Vault GC sweep cleans those.

What lives outside this service

Web UI → apps/web
Chat agent → Vercel Function in apps/web/app/api/.../chat/route.ts
Simulation → workers (not Fly per-skill; on-demand)
Provisioning machines → control API in apps/web, not the runner itself

Live Runtime

On this page