Agentic Trading
Ops

Launch checklist

This doc is operational, not aspirational. Every step has a specific command, SQL query, or UI gesture. If you find yourself improvising, slow down — the platform is designed so the slow path is the safe path.

The path from "code is in the repo" to "first real-money tick on Hyperliquid mainnet" with a tiny test position. Five phases — each with verification steps that catch failures before they cost money.

This doc is operational, not aspirational. Every step has a specific command, SQL query, or UI gesture. If you find yourself improvising, slow down — the platform is designed so the slow path is the safe path.

Cross-refs:


Phase 0 — Platform prep (one-time, before any user)

Platform-side prerequisites. None of them are obvious from the code; all of them will silently bite if you skip.

0.1 — Apply all migrations to prod Supabase

# From the repo root, with supabase CLI linked to the prod project:
supabase link --project-ref <prod-ref>
supabase db push

Verify the wallet/audit tables landed:

select tablename from pg_tables
where tablename in (
  'hyperliquid_master_wallets',
  'hyperliquid_agents',
  'pending_hyperliquid_agents',
  'mainnet_orders'
)
order by tablename;
-- Expect 4 rows.

-- Confirm Vault is actually enabled:
select extname, extversion from pg_extension
where extname in ('pgsodium', 'supabase_vault');
-- Expect 2 rows.

If supabase_vault is missing, stop and contact Supabase support to enable it before going further — the agent-approval flow can't work without it. Vault availability is project-tier-dependent.

0.2 — Smoke-test the Vault RPCs end-to-end

The single most failure-prone piece. In SQL editor:

-- Should return a UUID:
select public.create_hyperliquid_agent_secret(
  'test_secret_value',
  'smoke_test_' || gen_random_uuid()::text
);

-- Read it back via the agent path. Returns NULL because no hyperliquid_agents
-- row references this secret — that's expected and proves the join works:
select public.get_hyperliquid_agent_secret(gen_random_uuid());

-- Clean up the orphan vault entry:
delete from vault.secrets where name like 'smoke_test_%';

If create_hyperliquid_agent_secret errors with permission denied, the SECURITY DEFINER didn't take — check the function owner is postgres and that the vault.create_secret overload signature matches what migration 0006 declares.

0.3 — Create the Fly app

fly apps create agentic-live-runner --org <your-org>

# App-wide secrets, injected into every machine:
fly secrets set \
  SUPABASE_URL="https://<prod>.supabase.co" \
  SUPABASE_SERVICE_ROLE_KEY="<service-role-key>" \
  SUPABASE_DB_PASSWORD="<db-password>" \
  OPENROUTER_API_KEY="<key>" \
  HYPERLIQUID_LIVE_ENABLED="true" \
  -a agentic-live-runner

# Build + push the image; note the SHA in the output:
fly deploy \
  --config apps/live-runner/fly.toml \
  --dockerfile apps/live-runner/Dockerfile \
  --remote-only --build-only --push \
  -a agentic-live-runner

Note on SUPABASE_DB_PASSWORD: distinct from the service-role JWT. The runner uses raw pg for the LISTEN/NOTIFY command channel, which needs the database password from Supabase dashboard → Project Settings → Database. The runner also accepts POSTGRES_PASSWORD (the name the Supabase Vercel Marketplace integration uses) as a fallback.

Build context: the Dockerfile COPYs from the repo root, so fly deploy must run from there with --config + --dockerfile pointing at apps/live-runner/. Running from inside the app dir fails with "no such file".

Verify:

fly secrets list -a agentic-live-runner   # 5 secrets, all "Deployed"
fly machines list -a agentic-live-runner  # empty — machines are on-demand

0.4 — Set Vercel env vars

In Vercel dashboard → project → Settings → Environment Variables, production environment:

VariableValue
FLY_API_TOKENfly tokens create deploy -a agentic-live-runner
FLY_LIVE_RUNNER_APPagentic-live-runner
FLY_LIVE_RUNNER_IMAGESHA-pinned image string from 0.3 (registry.fly.io/agentic-live-runner@sha256:…)
FLY_LIVE_RUNNER_REGIONnrt
HYPERLIQUID_LIVE_ENABLEDtrue
DEPLOYMENTS_DISABLEDfalse
CRON_SECRETopenssl rand -hex 32 output
NEXT_PUBLIC_THIRDWEB_CLIENT_IDFrom thirdweb dashboard → Project → Settings → API Keys
SUPABASE_URL / SUPABASE_SERVICE_ROLE_KEY / NEXT_PUBLIC_SUPABASE_*Auto-populated if Supabase Marketplace integration is wired
SUPABASE_DB_PASSWORD (and/or POSTGRES_PASSWORD)Forwarded into Fly machines via web's createDeployment; needed for the live runner's LISTEN/NOTIFY connection. Marketplace integrations populate POSTGRES_PASSWORD; both names work.
OPENROUTER_API_KEYSame as Fly app secret

After deploy, verify the crons appear under Vercel → Crons. Both /api/jobs/heartbeat-sweep and /api/jobs/pending-agents-gc should be listed with a next-run time in the future.

Caveat — Vercel Hobby plan cron limit. Hobby plans cap cron frequency at daily. The shipped apps/web/vercel.json runs heartbeat-sweep daily at 03:00 UTC as a workaround. This means a dead Fly machine can sit unnoticed for up to 24 hours — acceptable for the first paper deployment, not acceptable before the first mainnet deployment. Before going live with real money, do one of:

  • Upgrade Vercel to Pro and change the schedule back to */5 * * * *, or
  • Move both jobs into a tight loop inside a dedicated always-on Fly machine. (Note: apps/sim-worker is no longer a viable host for this — it scales to zero between backtests per ADR-0023, so its machine isn't running most of the time.)

0.5 — Hit the cron endpoints manually once

curl -H "Authorization: Bearer $CRON_SECRET" \
  https://<your-domain>/api/jobs/heartbeat-sweep
# Expect: {"checked":0,"marked":0}

curl -H "Authorization: Bearer $CRON_SECRET" \
  https://<your-domain>/api/jobs/pending-agents-gc
# Expect: {"deleted":0}

401 → secret doesn't match what Vercel cron will send. 500 → check Vercel function logs.


Phase 1 — First paper deployment (smoke-test the platform itself)

Don't go straight to mainnet. The paper path exercises ~90% of the same code without exchange risk.

1.1 — Sign up and author a skill

  1. Sign in via magic link.
  2. Skills → New → fill in the strategy thesis fields.
  3. Save. Confirm skills + skill_versions rows exist:
select s.name, s.slug, sv.version, sv.payload->'model' as model
from public.skills s
join public.skill_versions sv on sv.skill_id = s.id
order by s.created_at desc limit 5;

1.2 — Backtest first

Run a backtest from the skill page (also smoke-tests apps/sim-worker on Fly if you've deployed it). Confirm the run completes and the metrics make sense. If the backtest is degenerate (e.g., agent makes zero proposals across 24h), fix the skill before going further — it'll be degenerate in live too.

1.3 — Deploy paper

  1. Skill page → Deploy → pick Paper → check the confirmation box → click Deploy.
  2. The page redirects to /deployments/<id>. Status should be provisioningrunning within ~30 seconds.

While provisioning, watch the Fly side:

fly machines list -a agentic-live-runner
# Note the new machine's ID.

fly logs -a agentic-live-runner -i <machine-id>
# Look for, in order:
#   "live-runner: booting deployment <uuid>"
#   "live-runner: loaded skill ... (broker=paper)"
#   "live-runner: status=running; entering tick loop"

If the machine fails to boot, the deployment row flips to status='error' with error_text. Common causes:

  • DEPLOYMENT_ID env not injected — check fly machines exec <id> -- env
  • Supabase service role key wrong — auth fails in loadDeploymentAndSkill
  • Migration not applied — hyperliquid_agent_id column missing → check constraint blows up the insert

1.4 — Watch the first three ticks

Refresh the deployment detail page after the first interval (5 min by default). Expect:

  • agent_state.equity_usd = $10,000 (paper starting balance)
  • Decisions timeline has rows with verdict badges
  • Logs view has runner_started + reconciled + tick entries
  • Token cost appearing in the decision rows' tok column
-- Verify per-tick cost is being computed:
select tick_at, prompt_tokens, completion_tokens, cost_usd
from public.decision_snapshots
where deployment_id = '<uuid>'
order by tick_at desc limit 5;

If cost_usd is null or 0 despite non-zero token counts, the model isn't in MODEL_RATES — check that apps/web/components/skill-editor/model-catalog.ts matches packages/shared/src/model-rates.ts.

1.5 — Exercise every command

From the deployment detail page, click in order: PauseResumeSnapshotFlattenStop. After each, verify:

  • New row in agent_commands with acked_at populated within ~1 second
  • Status pill updates after page revalidates
  • Fly machine actually exits after Stop (fly machines list no longer shows it after ~30s)

Phase 1.6 — Common boot failures + fixes

Captured live during the first Phase-1 smoke test. Each row is a real failure mode that surfaced; the cell is the diagnostic signature you'll see in Vercel logs (server) or agent_logs (runner).

SymptomWhere it showsRoot causeFix
Error: Attempted to call getDefaultConfig() from the server but getDefaultConfig is on the clientVercel λ POST /skills/[id]/deploy 500RainbowKit's getDefaultConfig pulled into a server bundle because the file holding wagmiConfig had no 'use client' directiveSplit: keep wagmiConfig in a 'use client' module (lib/web3/config.ts); move universal constants to a separate file (lib/hyperliquid/constants.ts) that server actions can import safely
Error: A "use server" file can only export async functions, found objectVercel server-component render 500Const array exported from a 'use server' module (e.g. COMMAND_KINDS)Extract the const + its derived types to a non-server module (lib/deployment-commands.ts); re-import in both the server action and any client component
live-runner: set SUPABASE_DB_URL, or SUPABASE_URL + (SUPABASE_DB_PASSWORD | POSTGRES_PASSWORD)…agent_logs.event=runner_fatal, deployment status=errorFly secret missing — runner has no way to open the pg LISTEN connectionfly secrets set SUPABASE_DB_PASSWORD=… -a agentic-live-runner then fly machines restart <id>
password authentication failed for user "postgres"agent_logs.event=runner_fatalWrong project's password set on Fly. Usually because the local .env.local still references a deleted Supabase project from an earlier Marketplace integrationReset the DB password in Supabase dashboard, re-set everywhere (Fly, Vercel, local), rewrite .env.local to point at the live project ref
paper broker: no mark price set for <SYMBOL>; call setMarkPrice firstagent_logs.event=tick_failedSymbol-discovery skill picked a symbol that wasn't in skill.context.symbols, so refreshMarkPrices never seeded a mark for itAlready fixed in code (Phase-1 commits 79b391a + c5af3d7): the runner wraps the bars client to push marks on fetch AND fetches Hyperliquid allMids at tick start. If this comes back, check that the Fly machine is on an image including those commits.
Hyperliquid allMids <status> <statusText>agent_logs.event=all_mids_fetch_failedTransient HL API blip or network egress issue from Fly machineRunner auto-falls back to bar-close marks for the skill.context.symbols list. Persistent: check HL status page; check Fly region's outbound connectivity.

Phase 1.7 — Secret rotation playbook

When you reset the Supabase database password (or rotate any Fly secret), three places need synchronized updates plus a deliberate machine restart. Skipping any step puts the runner in a "auth failed" loop the moment a tick fires.

# 1. Stage the new value on Fly (does NOT touch running machines):
fly secrets set SUPABASE_DB_PASSWORD="<new-value>" -a agentic-live-runner --stage

# 2. Mirror to Vercel (the web app forwards into createDeployment env):
cd apps/web
echo "<new-value>" | vercel env add SUPABASE_DB_PASSWORD production --force

# 3. Restart running Fly machines so they pick up the staged value.
#    `fly machines update --image` does NOT reliably re-pull staged
#    secrets — use `fly machines restart` (or destroy + redeploy):
for m in $(fly machines list -a agentic-live-runner --json | jq -r '.[].id'); do
  fly machines restart "$m" -a agentic-live-runner
done

# 4. Update local .env.local so `pnpm dev` works against the same DB:
#    POSTGRES_PASSWORD + the three POSTGRES_URL variants all need the
#    new value embedded.

Important: fly secrets deploy -a agentic-live-runner does not work here — that command targets apps deployed via fly deploy. Our app uses on-demand machine provisioning via the Machines API, so the staged-secret-to-machine handoff happens at machine create / restart time only.

Phase 2 — First mainnet (the tiny-balance smoke test)

Only proceed if Phase 1 was clean.

2.1 — Pre-flight: HL account preparation

In a fresh wallet (NOT your main wallet for the first test):

  1. Fund it with $100 of USDC on Arbitrum.
  2. Bridge to Hyperliquid via their UI.
  3. Confirm $100 lands in your HL perp account.

2.2 — Pair the master wallet

  1. Wallet → Connect → pick the test wallet → sign the pairing message.
  2. Confirm:
select address, verified_at, pairing_sig_hash is not null as has_audit_sig
from public.hyperliquid_master_wallets
order by created_at desc limit 1;

2.3 — Create a "tiny test" skill

Author a new skill with deliberately conservative caps:

maxPositionPct:      2     (notional; first position is ~$2 on a $100 wallet at 1x)
maxTotalExposurePct: 2     (single tiny position only)
maxOrdersPerDay:     4     (runaway loop can only burn a few orders of fees)
maxLeverage:         1
minOrderUsd:         10    (HL's venue minimum)
allowedSymbols:      ['BTC']
maxMonthlyCostUsd:   10
dailyLossHaltPct:    5

Run a paper backtest of this skill (10-min window is fine) to confirm it actually proposes something.

2.4 — Deploy mainnet

  1. Skill page → Deploy → Hyperliquid mainnet.
  2. Verify the cross-margin warning is NOT showing (this is the first mainnet deployment on this wallet).
  3. Click Authorize agent wallet — your wallet pops up with a typed-data signature. Read what you're signing: it should be HyperliquidTransaction:ApproveAgent with the platform-generated agent address visible.
  4. Sign.

If submission succeeds:

select agent_address, approved_at, revoked_at, approval_tx_hash
from public.hyperliquid_agents
order by created_at desc limit 1;

-- Verify Vault has the encrypted secret:
select count(*) from public.hyperliquid_agents a
join vault.decrypted_secrets ds on ds.id = a.vault_key_id;
  1. Check the confirmation box → Deploy live. Status → provisioningrunning in ~30s.

2.5 — Watch the first mainnet tick like a hawk

fly logs -a agentic-live-runner -i <machine-id> \
  | grep -E "broker_started|reconciled|order_placed|order_filled|tick_failed"

Expect in this order:

  1. broker_started
  2. reconciled — with the actual HL account state (positions=0, equity≈$100)
  3. First tick proceeds — likely noop or executed with a tiny BTC position

If you see tick_failed with NotYetImplemented, then the Slice-B build never made it to Fly — the broker is still the stub. Re-push the image.

After ~3 minutes (give WS marks time to populate), check Hyperliquid's UI — your account should now show the position the agent opened, OR still be flat if the agent chose noop. Either is fine — you've proved the path works.

2.6 — Verify the audit trail

-- One row per order the agent placed:
select cloid, symbol, side, status, filled_size_base, avg_fill_price,
       fee_usd, placed_at, settled_at
from public.mainnet_orders
where deployment_id = '<uuid>'
order by placed_at desc;

-- Cross-reference with agent_logs:
select ts, level, event, message
from public.agent_logs
where deployment_id = '<uuid>'
  and event in ('order_placed', 'order_filled', 'order_cancelled')
order by ts desc limit 20;

The cloid in mainnet_orders.cloid must match the cloid in the matching order_placed log's data field. If they don't, the order-event routing in apps/live-runner/src/broker.ts is broken.


Phase 3 — Operational dry-runs (prove the safety nets fire)

Each of these is a deliberate failure injection. Do them on the tiny-balance deployment from Phase 2, in a single sitting, before that deployment is upgraded to real size.

3.1 — Cost-guard trip

Easiest test: temporarily lower the skill's effective cap by editing the latest skill_versions.payload directly (throwaway — revert after):

-- Read current MTD:
select sum(cost_usd) as mtd_usd
from public.decision_snapshots
where deployment_id = '<uuid>'
  and tick_at >= date_trunc('month', now() at time zone 'utc');

-- Lower the cap below MTD (revert immediately after the test):
update public.skill_versions
set payload = jsonb_set(payload, '{risk,maxMonthlyCostUsd}', '0.01'::jsonb)
where skill_id = '<skill-uuid>' and version = <ver>;

Within one tick of the next decision, expect:

  • agent_commands row with kind='pause', payload->>'reason' starting with cost_guard:
  • agent_logs row with event='cost_guard_tripped'
  • Deployment status flips to paused

Revert the cap before continuing.

3.2 — Heartbeat sweeper

Cleanest test: stop the Fly machine and wait out the threshold.

fly machines stop <machine-id> -a agentic-live-runner

Wait at least max(3 * tick_interval, 15 min) past the last agent_state.updated_at, then:

curl -H "Authorization: Bearer $CRON_SECRET" \
  https://<your-domain>/api/jobs/heartbeat-sweep
# Expect: {"checked":N, "marked":1}

Verify:

select status, error_text from public.deployments where id = '<uuid>';
-- Expect: status='error', error_text='heartbeat lost: ...'

3.3 — Revoke flow

Create a separate small mainnet deployment for this test (don't revoke the one you're still verifying). Then:

  1. Click Revoke on its detail page.
  2. Confirm hyperliquid_agents.revoked_at is set.
  3. Confirm a stop command was queued for the active deployment.
  4. Deployment status flips to stopped within ~1 min.
  5. Now actually go to Hyperliquid's UI and remove the agent from your account — the step the platform can't do (see ADR-0016).
  6. Verify HL no longer shows the agent under approved agents.

Phase 4 — First 24 hours of real operation

After Phase 3, upgrade the test deployment to a real position size (raise maxPositionPct on a new skill version → redeploy). During the first 24h, watch these specific things.

Things that should happen normally

  • New decision_snapshots row every tick interval (5 min by default)
  • New agent_logs rows with event='reconciled' every 5 min (broker drift guard)
  • cost_usd per row stays consistent: ~$0.01–0.10 for Haiku, ~$0.05–0.30 for Sonnet
  • Cron job runs visible in Vercel dashboard, no failures

Things that should NOT happen (alarm bells)

  • tick_failed events more than ~1/hour — WS reconnects and transient HL API issues are fine, more is real
  • Any state_drift warning (REST reconcile disagreeing with WS) — investigate root cause
  • liquidation event — sweeper should have caught the drawdown approach, but liquidations are still possible in a flash crash
  • Cost-guard tripping unexpectedly — model is making longer calls than expected
  • Heartbeat sweeper marking any deployment dead

Queries to bookmark

-- Active deployments + freshness:
select d.id, d.broker_kind, d.status,
       s.updated_at as last_heartbeat,
       extract(epoch from (now() - s.updated_at))::int as seconds_stale
from public.deployments d
left join public.agent_state s on s.deployment_id = d.id
where d.status in ('running', 'paused', 'halted')
order by seconds_stale desc nulls first;

-- Today's spend by deployment:
select d.id, sum(ds.cost_usd) as today_usd, count(*) as ticks
from public.deployments d
join public.decision_snapshots ds on ds.deployment_id = d.id
where ds.tick_at >= date_trunc('day', now() at time zone 'utc')
group by d.id
order by today_usd desc;

-- Recent mainnet orders + status:
select cloid, symbol, side, status, size_base, filled_size_base,
       avg_fill_price, fee_usd, placed_at, settled_at
from public.mainnet_orders
where placed_at > now() - interval '24 hours'
order by placed_at desc limit 50;

Incident response cheat sheet

SymptomFirst actionRoot cause path
Skill is firing rapid ordersClick Pause on deployment pageCheck rate cap; if engine should have caught it, it's a bug
Position is wrong size vs intendedClick Pause, do NOT flatten yetCompare decision_snapshots.proposed_action.sizeUsd vs broker's actual order in mainnet_orders.size_base
Big loss showing on HLClick FlattenThen investigate why daily-loss-halt didn't fire — likely dailyLossHaltPct was set high
WS keeps disconnectingWatch fly logs for webdata2_apply_failedHL API issue or network — usually self-recovers; flip HYPERLIQUID_LIVE_ENABLED=false on Fly app if it persists
Cron secret leakedRotate CRON_SECRET in VercelOld jobs will start failing within minutes
Suspect agent key compromiseHit Revoke immediately, then go to HL UIBoth steps required for full revocation (ADR-0016)
Need to stop everything platform-widefly secrets set HYPERLIQUID_LIVE_ENABLED=false -a agentic-live-runner AND vercel env add DEPLOYMENTS_DISABLED true productionMainnet broker construction refuses; new deployments refused

What you can't test without paying

Three things you only learn from real load:

  1. Hyperliquid order fill latency under stress. The 2-second userFills wait might be tight on slow markets. Plan to revise after a week of real data.
  2. WS reconnect behavior across long-running sessions. The @nktkas/hyperliquid transport handles backoff but extended outages aren't well-documented; observe before relying on it past a few hours of disconnect.
  3. Cost-guard query performance as decision_snapshots grows. Currently the runner scans all month-to-date rows for the deployment on every tick. Fine at small scale; at 8k ticks/month/deployment with 10+ active deployments, replace with a materialized view or per-month rollup table.

None of these block the first real trade. All of them belong on a Phase 5 follow-up list once volume justifies the work.

On this page