Launch checklist

This doc is operational, not aspirational. Every step has a specific command, SQL query, or UI gesture. If you find yourself improvising, slow down — the platform is designed so the slow path is the safe path.

The path from "code is in the repo" to "first real-money tick on Hyperliquid mainnet" with a tiny test position. Five phases — each with verification steps that catch failures before they cost money.

Cross-refs:

architecture/live-runtime.md — runtime topology this checklist exercises
security/risk-controls.md — L1–L7 safety stack
ops/environments.md — env var inventory
decisions/0016-hyperliquid-agent-wallet-model.md — wallet model + revocation semantics

Phase 0 — Platform prep (one-time, before any user)

Platform-side prerequisites. None of them are obvious from the code; all of them will silently bite if you skip.

0.1 — Apply all migrations to prod Supabase

# From the repo root, with supabase CLI linked to the prod project:
supabase link --project-ref <prod-ref>
supabase db push

Verify the wallet/audit tables landed:

select tablename from pg_tables
where tablename in (
  'hyperliquid_master_wallets',
  'hyperliquid_agents',
  'pending_hyperliquid_agents',
  'mainnet_orders'
)
order by tablename;
-- Expect 4 rows.

-- Confirm Vault is actually enabled:
select extname, extversion from pg_extension
where extname in ('pgsodium', 'supabase_vault');
-- Expect 2 rows.

If supabase_vault is missing, stop and contact Supabase support to enable it before going further — the agent-approval flow can't work without it. Vault availability is project-tier-dependent.

0.2 — Smoke-test the Vault RPCs end-to-end

The single most failure-prone piece. In SQL editor:

-- Should return a UUID:
select public.create_hyperliquid_agent_secret(
  'test_secret_value',
  'smoke_test_' || gen_random_uuid()::text
);

-- Read it back via the agent path. Returns NULL because no hyperliquid_agents
-- row references this secret — that's expected and proves the join works:
select public.get_hyperliquid_agent_secret(gen_random_uuid());

-- Clean up the orphan vault entry:
delete from vault.secrets where name like 'smoke_test_%';

If create_hyperliquid_agent_secret errors with permission denied, the SECURITY DEFINER didn't take — check the function owner is postgres and that the vault.create_secret overload signature matches what migration 0006 declares.

0.3 — Create the Fly app

fly apps create agentic-live-runner --org <your-org>

# App-wide secrets, injected into every machine:
fly secrets set \
  SUPABASE_URL="https://<prod>.supabase.co" \
  SUPABASE_SERVICE_ROLE_KEY="<service-role-key>" \
  SUPABASE_DB_PASSWORD="<db-password>" \
  OPENROUTER_API_KEY="<key>" \
  HYPERLIQUID_LIVE_ENABLED="true" \
  -a agentic-live-runner

# Build + push the image; note the SHA in the output:
fly deploy \
  --config apps/live-runner/fly.toml \
  --dockerfile apps/live-runner/Dockerfile \
  --remote-only --build-only --push \
  -a agentic-live-runner

Note on SUPABASE_DB_PASSWORD: distinct from the service-role JWT. The runner uses raw pg for the LISTEN/NOTIFY command channel, which needs the database password from Supabase dashboard → Project Settings → Database. The runner also accepts POSTGRES_PASSWORD (the name the Supabase Vercel Marketplace integration uses) as a fallback.

Build context: the Dockerfile COPYs from the repo root, so fly deploy must run from there with --config + --dockerfile pointing at apps/live-runner/. Running from inside the app dir fails with "no such file".

Verify:

fly secrets list -a agentic-live-runner   # 5 secrets, all "Deployed"
fly machines list -a agentic-live-runner  # empty — machines are on-demand

0.4 — Set Vercel env vars

In Vercel dashboard → project → Settings → Environment Variables, production environment:

Variable	Value
`FLY_API_TOKEN`	`fly tokens create deploy -a agentic-live-runner`
`FLY_LIVE_RUNNER_APP`	`agentic-live-runner`
`FLY_LIVE_RUNNER_IMAGE`	SHA-pinned image string from 0.3 (`registry.fly.io/agentic-live-runner@sha256:…`)
`FLY_LIVE_RUNNER_REGION`	`nrt`
`HYPERLIQUID_LIVE_ENABLED`	`true`
`DEPLOYMENTS_DISABLED`	`false`
`CRON_SECRET`	`openssl rand -hex 32` output
`NEXT_PUBLIC_THIRDWEB_CLIENT_ID`	From thirdweb dashboard → Project → Settings → API Keys
`SUPABASE_URL` / `SUPABASE_SERVICE_ROLE_KEY` / `NEXT_PUBLIC_SUPABASE_*`	Auto-populated if Supabase Marketplace integration is wired
`SUPABASE_DB_PASSWORD` (and/or `POSTGRES_PASSWORD`)	Forwarded into Fly machines via web's `createDeployment`; needed for the live runner's LISTEN/NOTIFY connection. Marketplace integrations populate `POSTGRES_PASSWORD`; both names work.
`OPENROUTER_API_KEY`	Same as Fly app secret

After deploy, verify the crons appear under Vercel → Crons. Both /api/jobs/heartbeat-sweep and /api/jobs/pending-agents-gc should be listed with a next-run time in the future.

Caveat — Vercel Hobby plan cron limit. Hobby plans cap cron frequency at daily. The shipped apps/web/vercel.json runs heartbeat-sweep daily at 03:00 UTC as a workaround. This means a dead Fly machine can sit unnoticed for up to 24 hours — acceptable for the first paper deployment, not acceptable before the first mainnet deployment. Before going live with real money, do one of:

Upgrade Vercel to Pro and change the schedule back to */5 * * * *, or

Move both jobs into a tight loop inside a dedicated always-on Fly machine. (Note: apps/sim-worker is no longer a viable host for this — it scales to zero between backtests per ADR-0023, so its machine isn't running most of the time.)

0.5 — Hit the cron endpoints manually once

curl -H "Authorization: Bearer $CRON_SECRET" \
  https://<your-domain>/api/jobs/heartbeat-sweep
# Expect: {"checked":0,"marked":0}

curl -H "Authorization: Bearer $CRON_SECRET" \
  https://<your-domain>/api/jobs/pending-agents-gc
# Expect: {"deleted":0}

401 → secret doesn't match what Vercel cron will send. 500 → check Vercel function logs.

Phase 1 — First paper deployment (smoke-test the platform itself)

Don't go straight to mainnet. The paper path exercises ~90% of the same code without exchange risk.

Sign in via magic link.
Skills → New → fill in the strategy thesis fields.
Save. Confirm skills + skill_versions rows exist:

select s.name, s.slug, sv.version, sv.payload->'model' as model
from public.skills s
join public.skill_versions sv on sv.skill_id = s.id
order by s.created_at desc limit 5;

1.2 — Backtest first

Run a backtest from the skill page (also smoke-tests apps/sim-worker on Fly if you've deployed it). Confirm the run completes and the metrics make sense. If the backtest is degenerate (e.g., agent makes zero proposals across 24h), fix the skill before going further — it'll be degenerate in live too.

1.3 — Deploy paper

Skill page → Deploy → pick Paper → check the confirmation box → click Deploy.
The page redirects to /deployments/<id>. Status should be provisioning → running within ~30 seconds.

While provisioning, watch the Fly side:

fly machines list -a agentic-live-runner
# Note the new machine's ID.

fly logs -a agentic-live-runner -i <machine-id>
# Look for, in order:
#   "live-runner: booting deployment <uuid>"
#   "live-runner: loaded skill ... (broker=paper)"
#   "live-runner: status=running; entering tick loop"

If the machine fails to boot, the deployment row flips to status='error' with error_text. Common causes:

DEPLOYMENT_ID env not injected — check fly machines exec <id> -- env
Supabase service role key wrong — auth fails in loadDeploymentAndSkill
Migration not applied — hyperliquid_agent_id column missing → check constraint blows up the insert

1.4 — Watch the first three ticks

Refresh the deployment detail page after the first interval (5 min by default). Expect:

agent_state.equity_usd = $10,000 (paper starting balance)
Decisions timeline has rows with verdict badges
Logs view has runner_started + reconciled + tick entries
Token cost appearing in the decision rows' tok column

-- Verify per-tick cost is being computed:
select tick_at, prompt_tokens, completion_tokens, cost_usd
from public.decision_snapshots
where deployment_id = '<uuid>'
order by tick_at desc limit 5;

If cost_usd is null or 0 despite non-zero token counts, the model isn't in MODEL_RATES — check that apps/web/components/skill-editor/model-catalog.ts matches packages/shared/src/model-rates.ts.

1.5 — Exercise every command

From the deployment detail page, click in order: Pause → Resume → Snapshot → Flatten → Stop. After each, verify:

New row in agent_commands with acked_at populated within ~1 second
Status pill updates after page revalidates
Fly machine actually exits after Stop (fly machines list no longer shows it after ~30s)

Phase 1.6 — Common boot failures + fixes

Captured live during the first Phase-1 smoke test. Each row is a real failure mode that surfaced; the cell is the diagnostic signature you'll see in Vercel logs (server) or agent_logs (runner).

Symptom	Where it shows	Root cause	Fix
`Error: Attempted to call getDefaultConfig() from the server but getDefaultConfig is on the client`	Vercel `λ POST /skills/[id]/deploy 500`	RainbowKit's `getDefaultConfig` pulled into a server bundle because the file holding `wagmiConfig` had no `'use client'` directive	Split: keep `wagmiConfig` in a `'use client'` module (`lib/web3/config.ts`); move universal constants to a separate file (`lib/hyperliquid/constants.ts`) that server actions can import safely
`Error: A "use server" file can only export async functions, found object`	Vercel server-component render 500	Const array exported from a `'use server'` module (e.g. `COMMAND_KINDS`)	Extract the const + its derived types to a non-server module (`lib/deployment-commands.ts`); re-import in both the server action and any client component
`live-runner: set SUPABASE_DB_URL, or SUPABASE_URL + (SUPABASE_DB_PASSWORD \| POSTGRES_PASSWORD)…`	`agent_logs.event=runner_fatal`, deployment `status=error`	Fly secret missing — runner has no way to open the pg LISTEN connection	`fly secrets set SUPABASE_DB_PASSWORD=… -a agentic-live-runner` then `fly machines restart <id>`
`password authentication failed for user "postgres"`	`agent_logs.event=runner_fatal`	Wrong project's password set on Fly. Usually because the local `.env.local` still references a deleted Supabase project from an earlier Marketplace integration	Reset the DB password in Supabase dashboard, re-set everywhere (Fly, Vercel, local), rewrite `.env.local` to point at the live project ref
`paper broker: no mark price set for <SYMBOL>; call setMarkPrice first`	`agent_logs.event=tick_failed`	Symbol-discovery skill picked a symbol that wasn't in `skill.context.symbols`, so `refreshMarkPrices` never seeded a mark for it	Already fixed in code (Phase-1 commits `79b391a` + `c5af3d7`): the runner wraps the bars client to push marks on fetch AND fetches Hyperliquid `allMids` at tick start. If this comes back, check that the Fly machine is on an image including those commits.
`Hyperliquid allMids <status> <statusText>`	`agent_logs.event=all_mids_fetch_failed`	Transient HL API blip or network egress issue from Fly machine	Runner auto-falls back to bar-close marks for the `skill.context.symbols` list. Persistent: check HL status page; check Fly region's outbound connectivity.

Phase 1.7 — Secret rotation playbook

When you reset the Supabase database password (or rotate any Fly secret), three places need synchronized updates plus a deliberate machine restart. Skipping any step puts the runner in a "auth failed" loop the moment a tick fires.

# 1. Stage the new value on Fly (does NOT touch running machines):
fly secrets set SUPABASE_DB_PASSWORD="<new-value>" -a agentic-live-runner --stage

# 2. Mirror to Vercel (the web app forwards into createDeployment env):
cd apps/web
echo "<new-value>" | vercel env add SUPABASE_DB_PASSWORD production --force

# 3. Restart running Fly machines so they pick up the staged value.
#    `fly machines update --image` does NOT reliably re-pull staged
#    secrets — use `fly machines restart` (or destroy + redeploy):
for m in $(fly machines list -a agentic-live-runner --json | jq -r '.[].id'); do
  fly machines restart "$m" -a agentic-live-runner
done

# 4. Update local .env.local so `pnpm dev` works against the same DB:
#    POSTGRES_PASSWORD + the three POSTGRES_URL variants all need the
#    new value embedded.

Important: fly secrets deploy -a agentic-live-runner does not work here — that command targets apps deployed via fly deploy. Our app uses on-demand machine provisioning via the Machines API, so the staged-secret-to-machine handoff happens at machine create / restart time only.

Phase 2 — First mainnet (the tiny-balance smoke test)

Only proceed if Phase 1 was clean.

2.1 — Pre-flight: HL account preparation

In a fresh wallet (NOT your main wallet for the first test):

Fund it with $100 of USDC on Arbitrum.
Bridge to Hyperliquid via their UI.
Confirm $100 lands in your HL perp account.

2.2 — Pair the master wallet

Wallet → Connect → pick the test wallet → sign the pairing message.
Confirm:

select address, verified_at, pairing_sig_hash is not null as has_audit_sig
from public.hyperliquid_master_wallets
order by created_at desc limit 1;

2.3 — Create a "tiny test" skill

Author a new skill with deliberately conservative caps:

maxPositionPct:      2     (notional; first position is ~$2 on a $100 wallet at 1x)
maxTotalExposurePct: 2     (single tiny position only)
maxOrdersPerDay:     4     (runaway loop can only burn a few orders of fees)
maxLeverage:         1
minOrderUsd:         10    (HL's venue minimum)
allowedSymbols:      ['BTC']
maxMonthlyCostUsd:   10
dailyLossHaltPct:    5

Run a paper backtest of this skill (10-min window is fine) to confirm it actually proposes something.

2.4 — Deploy mainnet

Skill page → Deploy → Hyperliquid mainnet.
Verify the cross-margin warning is NOT showing (this is the first mainnet deployment on this wallet).
Click Authorize agent wallet — your wallet pops up with a typed-data signature. Read what you're signing: it should be HyperliquidTransaction:ApproveAgent with the platform-generated agent address visible.
Sign.

If submission succeeds:

select agent_address, approved_at, revoked_at, approval_tx_hash
from public.hyperliquid_agents
order by created_at desc limit 1;

-- Verify Vault has the encrypted secret:
select count(*) from public.hyperliquid_agents a
join vault.decrypted_secrets ds on ds.id = a.vault_key_id;

Check the confirmation box → Deploy live. Status → provisioning → running in ~30s.

2.5 — Watch the first mainnet tick like a hawk

fly logs -a agentic-live-runner -i <machine-id> \
  | grep -E "broker_started|reconciled|order_placed|order_filled|tick_failed"

Expect in this order:

broker_started
reconciled — with the actual HL account state (positions=0, equity≈$100)
First tick proceeds — likely noop or executed with a tiny BTC position

If you see tick_failed with NotYetImplemented, then the Slice-B build never made it to Fly — the broker is still the stub. Re-push the image.

After ~3 minutes (give WS marks time to populate), check Hyperliquid's UI — your account should now show the position the agent opened, OR still be flat if the agent chose noop. Either is fine — you've proved the path works.

2.6 — Verify the audit trail

-- One row per order the agent placed:
select cloid, symbol, side, status, filled_size_base, avg_fill_price,
       fee_usd, placed_at, settled_at
from public.mainnet_orders
where deployment_id = '<uuid>'
order by placed_at desc;

-- Cross-reference with agent_logs:
select ts, level, event, message
from public.agent_logs
where deployment_id = '<uuid>'
  and event in ('order_placed', 'order_filled', 'order_cancelled')
order by ts desc limit 20;

The cloid in mainnet_orders.cloid must match the cloid in the matching order_placed log's data field. If they don't, the order-event routing in apps/live-runner/src/broker.ts is broken.

Phase 3 — Operational dry-runs (prove the safety nets fire)

Each of these is a deliberate failure injection. Do them on the tiny-balance deployment from Phase 2, in a single sitting, before that deployment is upgraded to real size.

3.1 — Cost-guard trip

Easiest test: temporarily lower the skill's effective cap by editing the latest skill_versions.payload directly (throwaway — revert after):

-- Read current MTD:
select sum(cost_usd) as mtd_usd
from public.decision_snapshots
where deployment_id = '<uuid>'
  and tick_at >= date_trunc('month', now() at time zone 'utc');

-- Lower the cap below MTD (revert immediately after the test):
update public.skill_versions
set payload = jsonb_set(payload, '{risk,maxMonthlyCostUsd}', '0.01'::jsonb)
where skill_id = '<skill-uuid>' and version = <ver>;

Within one tick of the next decision, expect:

agent_commands row with kind='pause', payload->>'reason' starting with cost_guard:
agent_logs row with event='cost_guard_tripped'
Deployment status flips to paused

Revert the cap before continuing.

3.2 — Heartbeat sweeper

Cleanest test: stop the Fly machine and wait out the threshold.

fly machines stop <machine-id> -a agentic-live-runner

Wait at least max(3 * tick_interval, 15 min) past the last agent_state.updated_at, then:

curl -H "Authorization: Bearer $CRON_SECRET" \
  https://<your-domain>/api/jobs/heartbeat-sweep
# Expect: {"checked":N, "marked":1}

Verify:

select status, error_text from public.deployments where id = '<uuid>';
-- Expect: status='error', error_text='heartbeat lost: ...'

3.3 — Revoke flow

Create a separate small mainnet deployment for this test (don't revoke the one you're still verifying). Then:

Click Revoke on its detail page.
Confirm hyperliquid_agents.revoked_at is set.
Confirm a stop command was queued for the active deployment.
Deployment status flips to stopped within ~1 min.
Now actually go to Hyperliquid's UI and remove the agent from your account — the step the platform can't do (see ADR-0016).
Verify HL no longer shows the agent under approved agents.

Phase 4 — First 24 hours of real operation

After Phase 3, upgrade the test deployment to a real position size (raise maxPositionPct on a new skill version → redeploy). During the first 24h, watch these specific things.

Things that should happen normally

New decision_snapshots row every tick interval (5 min by default)
New agent_logs rows with event='reconciled' every 5 min (broker drift guard)
cost_usd per row stays consistent: ~$0.01–0.10 for Haiku, ~$0.05–0.30 for Sonnet
Cron job runs visible in Vercel dashboard, no failures

Things that should NOT happen (alarm bells)

tick_failed events more than ~1/hour — WS reconnects and transient HL API issues are fine, more is real
Any state_drift warning (REST reconcile disagreeing with WS) — investigate root cause
liquidation event — sweeper should have caught the drawdown approach, but liquidations are still possible in a flash crash
Cost-guard tripping unexpectedly — model is making longer calls than expected
Heartbeat sweeper marking any deployment dead

Queries to bookmark

-- Active deployments + freshness:
select d.id, d.broker_kind, d.status,
       s.updated_at as last_heartbeat,
       extract(epoch from (now() - s.updated_at))::int as seconds_stale
from public.deployments d
left join public.agent_state s on s.deployment_id = d.id
where d.status in ('running', 'paused', 'halted')
order by seconds_stale desc nulls first;

-- Today's spend by deployment:
select d.id, sum(ds.cost_usd) as today_usd, count(*) as ticks
from public.deployments d
join public.decision_snapshots ds on ds.deployment_id = d.id
where ds.tick_at >= date_trunc('day', now() at time zone 'utc')
group by d.id
order by today_usd desc;

-- Recent mainnet orders + status:
select cloid, symbol, side, status, size_base, filled_size_base,
       avg_fill_price, fee_usd, placed_at, settled_at
from public.mainnet_orders
where placed_at > now() - interval '24 hours'
order by placed_at desc limit 50;

Incident response cheat sheet

Symptom	First action	Root cause path
Skill is firing rapid orders	Click Pause on deployment page	Check rate cap; if engine should have caught it, it's a bug
Position is wrong size vs intended	Click Pause, do NOT flatten yet	Compare `decision_snapshots.proposed_action.sizeUsd` vs broker's actual order in `mainnet_orders.size_base`
Big loss showing on HL	Click Flatten	Then investigate why daily-loss-halt didn't fire — likely `dailyLossHaltPct` was set high
WS keeps disconnecting	Watch `fly logs` for `webdata2_apply_failed`	HL API issue or network — usually self-recovers; flip `HYPERLIQUID_LIVE_ENABLED=false` on Fly app if it persists
Cron secret leaked	Rotate `CRON_SECRET` in Vercel	Old jobs will start failing within minutes
Suspect agent key compromise	Hit Revoke immediately, then go to HL UI	Both steps required for full revocation (ADR-0016)
Need to stop everything platform-wide	`fly secrets set HYPERLIQUID_LIVE_ENABLED=false -a agentic-live-runner` AND `vercel env add DEPLOYMENTS_DISABLED true production`	Mainnet broker construction refuses; new deployments refused

What you can't test without paying

Three things you only learn from real load:

Hyperliquid order fill latency under stress. The 2-second userFills wait might be tight on slow markets. Plan to revise after a week of real data.
WS reconnect behavior across long-running sessions. The @nktkas/hyperliquid transport handles backoff but extended outages aren't well-documented; observe before relying on it past a few hours of disconnect.
Cost-guard query performance as decision_snapshots grows. Currently the runner scans all month-to-date rows for the deployment on every tick. Fine at small scale; at 8k ticks/month/deployment with 10+ active deployments, replace with a materialized view or per-month rollup table.

None of these block the first real trade. All of them belong on a Phase 5 follow-up list once volume justifies the work.

Launch checklist

On this page