App:apps/live-runnerHosted on: Fly.io Machines, one per active Deployment
Depends on:packages/agent-runtime, packages/execution-engine, packages/brokers/hyperliquid-*, packages/db
Persistent websocket connections to Hyperliquid (market data, fills)
State that survives between ticks (open positions, in-flight orders)
Schedule adherence (a 5m-cron Skill must tick within seconds of every 5min mark)
Two-way communication with the deployer (commands in, state out)
None of these fit Vercel Functions cleanly. Fly Machines do: one tiny VM per Skill, fast cold start (~1s), $0/mo when stopped, scales per-skill independently.
For each deployments row with status IN ('running', 'paused', 'halted') there is exactly one Fly Machine. When the deployment stops, the machine is destroyed. When it's created, a machine is provisioned.
This isolation gives us:
A crash in one Skill cannot affect another
Per-Skill memory limits + CPU sizing
Per-Skill log streams (Fly Logs scoped by machine)
Trivial horizontal scale — provision N machines for N Skills
async function controlLoop(deploymentId: string, state: SharedState) { const client = await db.dedicatedConnection(); await client.query(`LISTEN agent_commands_${deploymentId}`); client.on('notification', async ({ payload }) => { const cmd = await db.getCommand(payload); // payload = command id await handleCommand(cmd, state); await db.markCommandAcked(cmd.id); });}async function handleCommand(cmd, state) { switch (cmd.kind) { case 'stop': state.running = false; break; // tick loop exits, process drains case 'pause': state.paused = true; break; case 'resume': state.paused = false; break; case 'flatten': await flattenAll(state.broker); state.paused = true; break; case 'kill': process.exit(1); // skip drain, immediate case 'snapshot': await writeStateSnapshot(state); break; }}
LISTEN/NOTIFY is Postgres-native, sub-second latency, no extra infra. Supabase exposes the same channel via its Realtime websocket for the web client if needed later.
Owned by the HyperliquidMainnetBroker itself, not the tick loop. The broker maintains four subscriptions on the master account, all multiplexed over a single WebSocketTransport from @nktkas/hyperliquid:
Channel
Purpose
Drives
webData2(master)
Full account-state push (positions, equity, margin, open orders, exchange-computed liquidationPx)
Sets broker's halted=true on liquidation; logs funding_paid events
allMids
Per-symbol mark prices
Drives the per-symbol mark-staleness gate (N1)
Defaults (from Slice-B N1–N6):
Mark-staleness gate (N1): per-symbol; if the last allMids update for a symbol is older than 30 s, placeOrder rejects new opens for that symbol with a synthetic error. Close orders always allowed (a stale close is still safer than letting the position ride).
Market-order semantics (N2): Hyperliquid has no native market order. The broker sends a marketable limit at mark ± 1% with tif: "Ioc", generates a 16-byte cloid for idempotency, and waits up to 2 s for the matching userFills event before returning. Hard cap on the full call is 5 s; if no fill arrives, placeOrder returns { orderId, fill: null } and the next tick's snapshot reflects the eventual fill once WS delivers.
WS outage handling (N3): the broker logs warnings and reconnects (transport handles backoff); it does NOT auto-halt the deployment. The mark-staleness gate is the counterweight; the deployer can pause if they want.
Per-fill audit (N4): every order_placed, order_filled, order_cancelled, funding_paid, liquidation is written to agent_logs via a logger callback the runner injects into the broker.
Liquidation handling (N5): on userEvents.liquidation, broker flips halted=true. The tick loop reads the flag through state.halted (set by the broker factory wiring) and refuses new opens. Manual clear_halt required to resume.
Periodic REST drift guard (N6): every 5 min (configurable), the broker re-pulls clearinghouseState, openOrders, allMids and overwrites the WS-derived state. Hyperliquid is the source of truth; if REST and WS disagree, REST wins and a state_drift warning is logged.
Mainnet deployments need a signing identity. Per ADR-0016:
The user pairs their master wallet at /wallet by personal-signing a server-issued challenge. We store the address only; we never see the master's private key.
At deploy time the platform generates a fresh secp256k1 agent wallet, persists the private key in Supabase Vault, and asks the master to sign Hyperliquid's approveAgent EIP-712 payload. We submit the signed action to https://api.hyperliquid.xyz/exchange; on status: ok, the agent is authorized to trade on the master's account.
The agent address + its Vault key id are recorded in hyperliquid_agents. The matching deployments.hyperliquid_agent_id FK is enforced by a DB check constraint: paper deployments must have null, mainnet deployments must have non-null.
Approval is unbounded; revocation happens on-chain via removeAgent from the master. The agent row's revoked_at is set when we observe revocation; once set, the runner refuses to start.
The agent's private key is decrypted at runner boot via the get_hyperliquid_agent_secret(agent_id) SECURITY DEFINER RPC and held in process memory for the lifetime of the machine. Never written back, never logged.
User clicks Deploy on the skill detail page → /skills/[id]/deploy.
User picks paper or hyperliquid-mainnet (testnet removed — ADR-0015).
For mainnet only:
Web checks DEPLOYMENTS_DISABLED and HYPERLIQUID_LIVE_ENABLED env flags; refuses if either gates closed.
Pairing precheck: user must have at least one paired master wallet.
createAgentApprovalChallenge generates the agent keypair, stores the privkey in Vault, returns the EIP-712 typed data.
Wagmi prompts the master wallet to sign; signature is submitted via submitAgentApproval, which POSTs to Hyperliquid /exchange and on success inserts hyperliquid_agents.
createDeployment inserts the deployments row with status='provisioning'.
Web calls Fly Machines API:
POST /v1/apps/agentic-live-runner/machines{ "name": "dep-<first 8 chars of deployment id>", "region": "nrt", // FLY_LIVE_RUNNER_REGION "config": { "image": "<FLY_LIVE_RUNNER_IMAGE>", // SHA-pinned per release "env": { "DEPLOYMENT_ID": "<uuid>", "HYPERLIQUID_LIVE_ENABLED": "true" // only for mainnet machines }, "guest": { "cpu_kind": "shared", "cpus": 1, "memory_mb": 512 }, "restart": { "policy": "on-failure" }, "auto_destroy": false }}
App-level secrets (SUPABASE_*, OPENROUTER_API_KEY, etc.) live on the Fly app via fly secrets set and are injected automatically.
Web updates the deployment row with fly_machine_id, fly_region, fly_image, started_at. The image SHA is captured at provisioning time so we can later detect when a deployment is running an older platform release — see "Image pinning & per-deployment upgrades" below.
Machine boots, runner reads DEPLOYMENT_ID, loads deployment + skill, instantiates the broker via makeBroker(deployment):
Paper → in-memory PaperBroker.
Mainnet → HyperliquidMainnetBroker(masterAddress, agentAddress, decryptedPrivKey). Refuses to construct if HYPERLIQUID_LIVE_ENABLED !== 'true'.
Runner marks deployments.status='running' and enters the tick loop.
Failure cases: Fly create throws → row is marked status='error' with error_text and the user sees it on the deployment detail page. Broker construction throws (mainnet kill switch flipped) → runner exits, machine restarts; if the flag stays off the deployment will sit at provisioning/error until cleared.
Any saved version is deployable, not just the latest.createDeployment takes an optional version (validated against skill_versions); omitted → skill.latest_version. The skill detail page exposes a per-version Deploy button in the version history, and the chat deploy card passes ?version= / ?broker= to the deploy page, which the page now consumes (target version drives the backtest verdict, broker prefills the picker). The (skill_id, skill_version) FK guarantees the version exists.
Three rules, enforced in createDeployment and mirrored in the UI. "Active" = status in provisioning | running | paused | halted | stopping (ACTIVE_DEPLOYMENT_STATUSES in lib/deployments-query.ts).
One active deployment per (skill, broker). A redeploy replaces its predecessor — stopCollidingDeployments queues stop on the same (skill_id, broker_kind) before provisioning. So you can't run two paper deployments of one skill, nor two mainnet, but 1 paper + 1 mainnet of the same skill can run side by side (the collision is scoped to the broker).
At most ONE active mainnet deployment across all skills (MVP). When deploying mainnet, createDeployment refuses if a different skill already has a live mainnet (getActiveMainnetDeployment, compared by skill_id). Redeploying the same skill's mainnet is allowed — rule 1 replaces it. The deploy form locks the mainnet path (paper stays available) and shows "you can deploy only one mainnet deployment."
Paper has no global cap. Any number of paper deployments across different skills can run at once (subject to rule 1 per skill).
To change a live mainnet deployment, stop it first, then deploy. Keeping a live deployment on the current platform release is an in-place image upgrade (upgradeDeploymentImage), not a new deployment, so it's unaffected by these rules.
Atomic enforcement. The pure predicates live in lib/deploy-rules.ts (mainnetCapBlocks, sameSkillBrokerCollisions, unit-tested) and are shared by createDeployment and the deploy page. Because the server action is check-then-insert, two partial unique indexes (migration 20260611120000) are the atomic backstop so a race can't break the rules:
deployments_one_active_mainnet_per_user — unique (user_id) where broker_kind='hyperliquid-mainnet' and status in (occupying).
deployments_one_active_per_skill_broker — unique (user_id, skill_id, broker_kind) where status in (occupying).
"Occupying" excludes stopping: a redeploy flips its predecessor to stoppingbefore inserting the replacement (stopCollidingDeployments), so the slot reads free and the index doesn't reject a legitimate redeploy. A 23505 from these indexes is translated back into the same rule message.
The single-mainnet cap is a deliberate MVP guardrail. Expanding to multiple concurrent mainnet deployments (per-skill margin isolation, aggregate exposure accounting across the shared Hyperliquid cross-margin pool, rate-limit budgeting) is future work — revisit before lifting it.
Each Fly machine is pinned to a specific image SHA at creation time. Pushing a new platform release does not roll existing machines — by design, because rolling a runner mid-tick interrupts the trading loop and can leave in-flight Hyperliquid orders in an unknown state.
This means every running deployment can be on a different release. We track the SHA per-deployment so the UI can surface a divergence to the user.
The .github/workflows/live-runner-deploy.yml workflow runs on every push to main that touches apps/live-runner/** or any package the runner depends on:
Build + push the live-runner image to Fly's registry via flyctl deploy --build-only --push. Capture the SHA-tagged image (e.g. registry.fly.io/agentic-live-runner:deployment-XXXX).
PATCH FLY_LIVE_RUNNER_IMAGE on the Vercel web project via the Vercel REST API. New deployments created from the dashboard after this point provision onto the latest image.
Existing machines are deliberately left alone.
The web app's currentLiveRunnerImage() helper (apps/web/lib/fly.ts) reads the same env var; the runner doesn't need it (the runner image already is the pinned image).
On the deployment detail page (/deployments/[id]), server-side rendering compares deployments.fly_image to currentLiveRunnerImage(). If they differ AND the deployment is in an active status (running / paused / halted / error), the page renders a VersionBanner with an "Upgrade now" button. The deployments list (/deployments) shows a small update pill on the same rows.
Rows with fly_image IS NULL (created before migration 0009) are treated as "unknown" and never badged — avoids false positives on legacy deployments.
upgradeDeploymentImage({ deploymentId }) server action does an in-place machine update:
Verify the deployment is owned by the caller and is in an upgradable status.
POST /apps/<app>/machines/<id> with { config: { image: <new SHA> } } — Fly preserves env, guest, restart policy, and the machine id.
POST /apps/<app>/machines/<id>/restart so the new image actually runs.
Persist the new SHA to deployments.fly_image so the banner clears immediately.
Preserved across the upgrade:
Fly machine id → all the per-deployment infrastructure (Postgres LISTEN channel name, agent_state row, decision_snapshots, agent_logs).
The Hyperliquid agent's Vault key id → no re-approval flow needed.
Open Hyperliquid positions and orders — they live on the exchange, not in our machine.
What the user sees: the tick loop pauses for ~10–30 s during the restart, the broker reconciles from the exchange on boot (open orders, positions, equity), and ticking resumes. No removeAgent/re-approve flow.
Mid-trade interruption. A Hyperliquid order that lands mid-restart can fill on either side of the restart and the reconcile path has to be the one that catches up — fine when the user opts in at a quiet moment, risky as a platform-wide event.
Behavioral drift surprises users. A skill that's been profitably trading on release N suddenly behaving differently on N+1 is exactly the bug report we don't want. Upgrade is a deliberate act, not a Tuesday morning surprise.
The cost is that platform releases don't propagate until each user clicks Upgrade. That's the right trade for a trading product.
Per deployment, the web UI at /deployments/[id] shows:
Status pill + halt / error banners (from deployments)
Current portfolio: equity, day PnL, positions, open orders (from agent_state)
Actions panel: pause / resume / clear_halt / flatten / stop / kill / snapshot. Each button is gated on the current status and routed through sendDeploymentCommand, which inserts an agent_commands row and lets the runner's LISTEN/NOTIFY loop do the work. kill is the exception — it also destroys the Fly machine via the Machines API immediately, because by definition the runner isn't responding.
For mainnet: the agent address with a "Revoke agent (platform-side)" button. Platform-side revocation queues a stop and sets hyperliquid_agents.revoked_at; full on-chain revocation requires the user to remove the agent from Hyperliquid's UI (the SDK doesn't expose removeAgent). The UI is explicit about the two-step model — see ADR-0016.
Recent decisions (50, from decision_snapshots) — verdict badge + click-to-expand for final_text and engine_result
Recent logs (50, from agent_logs) — filterable by level
Recent commands + ack status (20, from agent_commands)
Updates are not streamed in MVP — the page revalidates after each command via router.refresh(). Supabase Realtime streaming is Slice D scope.
Phase 3 adds Sentry for unexpected errors and a cost dashboard.
Vercel cron at /api/jobs/heartbeat-sweep runs every 5 minutes. For each deployment with status IN ('running', 'paused'), joins to agent_state.updated_at and checks against a per-deployment threshold of max(3 × tick_interval, 15 min). Over-threshold rows are flipped to status='error' with error_text='heartbeat lost: …' and an agent_logs row is appended.
We deliberately do NOT auto-destroy the Fly machine — too easy to misfire on a brief lag and orphan an in-flight order. The user kills the machine manually via the deployment detail page.
After every tick, the runner computes cost_usd from token counts × the rate table in @repo/shared's MODEL_RATES, writes it onto the decision_snapshots row, and sums month-to-date cost_usd for the deployment. If the sum exceeds skill.risk.maxMonthlyCostUsd (default $200), the runner queues a pause command via agent_commands with a cost_guard_tripped reason. The UTC month boundary is the natural reset point — no per-month counter table.
The user can resume after raising the cap on a new Skill version (Skill versions are immutable, so the resume implicitly requires a new version with the higher cap).
Wired via the broker's logger callback (Slice B). The broker emits order_placed / order_filled / order_cancelled / liquidation_fill events; the runner's routeOrderEvent in apps/live-runner/src/broker.ts upserts a row per event. Partial fills accumulate filled_size_base + recompute weighted avg_fill_price; full coverage flips status='filled'. Paper deployments don't touch this table.
This is the structured audit story when the user (or compliance) asks "where did this $X go on Hyperliquid?".
pending_hyperliquid_agents table stages the (vault_key_id, master_wallet_id, agent_address, agent_name, nonce_ms) tuple between createAgentApprovalChallenge and submitAgentApproval. Replaces the Slice-A in-memory Map so the wallet-pairing flow survives across Vercel Fluid instances.
A daily cron at /api/jobs/pending-agents-gc deletes rows older than 10 minutes. The corresponding Vault secret stays around as an orphan — accepted leak for MVP; future Vault GC sweep cleans those.