ADR-0016: Hyperliquid agent wallet model for live deployments

Status: accepted
Date: 2026-06-03
Builds on: ADR-0001, ADR-0004, ADR-0015
Affects: packages/brokers/hyperliquid-mainnet/, apps/live-runner/src/broker.ts, apps/web/lib/{wallet-actions,deployment-actions,fly,hyperliquid/sign-actions}.ts, apps/web/app/(app)/{wallet,skills/[skillId]/deploy,deployments}/..., packages/db/supabase/migrations/20260603120000_live_deployment_wallets.sql

Context

ADR-0015 removed testnet and committed mainnet as the only live target. With that decision settled, the next question is how live trading authenticates: how the user delegates signing authority to a runtime we operate.

Hyperliquid's account model:

An account is identified by an EVM address; cross-margin trading happens at the account level.
The account owner (the master wallet) can authorize one or more agent wallets ("API wallets" in Hyperliquid's docs) via the on-exchange approveAgent action. An approved agent can sign orders/cancels on behalf of the master but cannot withdraw funds — the withdraw action still requires the master's signature.
Hyperliquid does not support sub-accounts with ring-fenced equity. All authorized agents act on the same shared margin pool.

The platform-side question has four dimensions:

Who holds the agent's private key? Trading happens via signed off-chain actions; the signer must be alive at request time. Options are master-only, per-user, per-skill, or per-deployment.
How long is the agent authorized? Hyperliquid supports unbounded approval or a validUntil window.
Where is the agent's private key stored at rest? DB encrypted, Fly secret only, or hybrid.
Which Fly region hosts the runner? Latency to Hyperliquid edges matters.

Two Slice-A scope-shaping questions follow:

One agent per deployment or per Skill?
How is shared account equity surfaced to multiple Skills on the same wallet?

Decision

The architecture below was reached in the Slice-A design discussion. Each item lists the decision and the rationale; alternatives considered are below.

Q1 — Wallet connect: RainbowKit + wagmi + viem

The user signs pairing challenges and the approveAgent action from their existing on-chain wallet (MetaMask / Rabby / Frame / hardware via WalletConnect). RainbowKit on top of wagmi is the standard React surface; viem produces the EIP-712 typed-data envelope. Privy was considered but its core value (email-onboarded embedded wallets) is irrelevant — the master wallet is by definition not embedded.

Q2 — `approveAgent` validity: unbounded

The agent is approved without a validUntil; revocation requires the master to call removeAgent. We surface the agent address in the deployment detail page so the user can audit and revoke at any time. The 90-day time-box alternative was rejected because the user wants no friction once authorized; the security tradeoff is documented (see Consequences).

Q3 — Agent private-key storage: Supabase Vault (pgsodium) at rest; service-role decrypt at runner boot

The agent's hex-encoded secp256k1 private key is stored in Supabase Vault via the migration's create_hyperliquid_agent_secret(secret, name) SECURITY DEFINER RPC. At Fly machine boot, the live runner calls get_hyperliquid_agent_secret(agent_id) (also SECURITY DEFINER) via the service-role client; the function joins the agents row with vault.decrypted_secrets and returns the cleartext only if the agent isn't revoked. Cleartext is never written to a non-Vault column; never logged; never returned to the user.

The original Q3 menu also included a "Fly-only secret with no DB persistence" option (B) and a hybrid (C). The user picked A because it lets a deployment survive a Fly machine destroy/recreate without re-approval — the operational win is worth the residual blast-radius cost of having the encrypted bytes in DB.

Q4 — Fly primary region: `nrt` (Tokyo) for live runners

Hyperliquid's API/WS edges are best from Tokyo. apps/live-runner/fly.toml still names iad because that file is used only for image releases (the actual machines are provisioned via the Machines API with an explicit region). Provisioning code reads FLY_LIVE_RUNNER_REGION and defaults to nrt. Sims (apps/sim-worker) stay in iad.

Q5 — One agent per deployment

Each deployments row owns its own hyperliquid_agents row (FK with on delete restrict). A check constraint forces (broker_kind = 'hyperliquid-mainnet') ⇔ (hyperliquid_agent_id IS NOT NULL). Per-Skill agents were rejected because Skills can have multiple deployments (paper concurrent with mainnet, or two mainnet variants on different master wallets); sharing one agent across deployments would conflict with the per-deployment vault key model and with future revocation UX.

Q6 — Allocated equity: A — agent reads full master equity; UI warns on 2nd+ mainnet deployment

Hyperliquid has no concept of ring-fenced sub-account equity; both the engine's maxPositionPct and the broker's cross-margin checks operate against the master account's total equity. We surface the same engine and broker code in sim and live; we surface an explicit warning on the Deploy-Live page when the user is about to run a 2nd+ mainnet deployment on the same master wallet, explaining that risk caps apply to total account equity, not to a per-Skill allocation. A virtual-allocation system was rejected as Phase-3 scope — it adds tracking without adding actual safety.

What this changes

Surface	Change
DB	`hyperliquid_master_wallets`, `hyperliquid_agents` tables; `deployments.hyperliquid_agent_id` column + check constraint; `deployments.broker_kind` constraint loses `'hyperliquid-testnet'` (cleanup from ADR-0015); two SECURITY DEFINER RPCs over Vault
Web	`/wallet` pairing surface; `/skills/[id]/deploy` form; `/deployments/[id]` minimal status surface; `lib/wallet-actions.ts`, `lib/deployment-actions.ts`, `lib/fly.ts`, `lib/hyperliquid/sign-actions.ts`; RainbowKit + wagmi + viem deps
Live runner	`src/broker.ts` factory branches on `broker_kind`; the paper-only hard reject is removed
Brokers	New `packages/brokers/hyperliquid-mainnet/` — Slice A is a stub that holds the agent key and throws `NotYetImplemented` on every BrokerAdapter method; Slice B replaces with `@nktkas/hyperliquid`
Env	`HYPERLIQUID_LIVE_ENABLED`, `DEPLOYMENTS_DISABLED` (L7 kill switches); `FLY_API_TOKEN`, `FLY_LIVE_RUNNER_{APP,IMAGE,REGION,MEMORY_MB,CPU_KIND,CPUS}`; `NEXT_PUBLIC_THIRDWEB_CLIENT_ID` (wallet connect/sign — see Update 2026-06-10 below)

Alternatives considered

Alt A — User-provided API key per user, single agent shared by all their deployments

Simpler UX (one approval covers many Skills), but a single rogue Skill can touch positions opened by another Skill — the engine's risk caps prevent breach in practice, but the physical boundary is missing and the audit story is muddier ("which Skill placed this order?" needs reasoning over decision_snapshots rather than a direct address match). Not picked. Defended in Q5.

Alt B — Per-user wallet client-side, no platform-held key

User keeps the key, the live runner calls back to the user's wallet for every signature. Impossible — the runner is a long-running headless process and there's no user session to reach. Not picked.

Alt C — `validUntil`-bounded approvals with auto-renewal

Smaller blast radius if a Fly machine is ever exfiltrated. Friction-heavy UX (re-approval every 90 days, with the runner self-pausing 3 days before expiry to nudge). User explicitly preferred no expiry; we document the tradeoff and add the revoke flow as a counterweight. Not picked. Open the door to revisit in Phase 3 if there's a security incident, or behind a per-user setting if some users want the tighter window.

Alt D — Fly-only secret (no DB persistence)

Strongest blast-radius story — even a DB compromise gives the attacker no usable signing material. But every machine destroy/recreate (Fly restart policy, region migration, manual stop/resume) forces the user back to approveAgent from their hardware wallet. Brutal UX. Not picked. See Q3.

Consequences

Positive

Per-deployment isolation. Each deployment's signing authority is independently auditable and revokable.
Operationally simple. Vault handles encryption at rest; the runner reads the secret once at boot, holds it in process memory, and never writes it back to DB.
Honest UX. Mainnet picker shows the warning about cross-margin; the user knows what they're signing up for.
One Fly app, many machines. No app-per-deployment fragmentation; secrets are managed once at the app level.

Negative / trade-offs

Unbounded approval ⇒ revocation is on the user. If a Fly machine were compromised, the platform-held key continues to be usable until the user invokes removeAgent from their master wallet. We mitigate by surfacing the agent address + a "revoke" pointer in the deployment detail page, but the door is open. The Q3 storage choice and Vault-only blast-radius story make this materially less likely than the alternative wording suggests, but it is real.
Vault read path goes through service-role. Anything that compromises SUPABASE_SERVICE_ROLE_KEY on the Fly machine also unlocks Vault. The kill-switch env (HYPERLIQUID_LIVE_ENABLED) is a cheap layer-7 counterweight: ops can turn off all mainnet trading platform-wide by setting the flag to anything other than 'true' on the Fly app, without touching DB.
Shared margin pool is genuinely shared. Running two mainnet Skills on the same wallet is honest, but it's a footgun if the user assumes per-Skill ring-fencing. The Q6 warning copy is the mitigation; if users misread it we'll need to reconsider in Phase 3 (a "deny 2nd+ mainnet deployment on same wallet" hard cap is the obvious next move).
Pending-agent staging is in-process memory. The Slice-A wallet-actions code holds a Map of pending agents while the user signs the approval. Not safe across multiple server instances. Replace with a pending_hyperliquid_agents DB table or a stateless HMAC-signed token before scaling out.

Things we'll need to revisit

Auto-renew / time-box on a per-user setting. Users with cold-storage masters may want longer fence; users with hot masters may want shorter. Defer until we have a real signal.
Vault key rotation. Vault uses a single project-wide key. Rotating it is invasive; we accept the static key for MVP and revisit when SOC2 or similar lands on the roadmap.
Sub-account model on Hyperliquid. If Hyperliquid ever adds per-agent sub-accounts with ring-fenced equity, switching Q6 to "B" becomes the obvious move and lets us drop the cross-margin warning.

Update 2026-06-10 — wallet connection library: WalletConnect/RainbowKit → thirdweb

The agent-wallet model in this ADR is unchanged. Only the browser wallet-connection library changed: we replaced wagmi + RainbowKit + WalletConnect with thirdweb (thirdweb/react), consolidating onto the one stack we already adopted for USDC top-ups (ADR-0022). Motivation: one wallet stack and one env var instead of two, and removing RainbowKit's getDefaultConfig build-time projectId footgun (a missing NEXT_PUBLIC_WALLETCONNECT_PROJECT_ID had 500'd every authed route twice in prod).

What did not change — and why this was low-risk:

The two signed payloads are identical: the pairing personal_sign challenge and the Hyperliquid approveAgent EIP-712 typed data. thirdweb's account.signMessage / account.signTypedData take the same shapes wagmi's hooks did and emit standard signatures.
Server-side verification is untouched: viem verifyMessage for pairing, and forwarding the approveAgent signature to Hyperliquid's /exchange. Both are signer-library-agnostic.
lib/wallet-actions.ts (server) and lib/hyperliquid/* are unchanged; viem stays (server keygen/verify + the live-runner).

Mechanics: useAccount→useActiveAccount, useSignTypedData/useSignMessage→ account.sign*, useSwitchChain→useSwitchActiveWalletChain, RainbowKit <ConnectButton>→thirdweb <ConnectButton client={…}>. lib/web3/config.ts deleted; provider in components/web3-providers.tsx now mounts ThirdwebProvider. Env: NEXT_PUBLIC_WALLETCONNECT_PROJECT_ID → NEXT_PUBLIC_THIRDWEB_CLIENT_ID.

References

ADR-0001 — Hyperliquid as MVP exchange
ADR-0004 — one Fly machine per deployment
ADR-0014 — broker-authoritative state contract that the Hyperliquid mainnet adapter inherits
ADR-0015 — testnet removed; mainnet is the only live target
security/risk-controls.md — the L1–L7 stack the env-flag kill switch extends
architecture/live-runtime.md — the runtime topology this ADR's wallet model plugs into

ADR-0016: Hyperliquid agent wallet model for live deployments

On this page