ADR-0019: One chat agent for authoring, coaching, and ops

Status: accepted
Date: 2026-06-05
Refines: ADR-0005 — Skills are still authored as data, but the authoring surface shifts from a 7-tab form to a chat that writes the same SkillPayload.
Refines: ADR-0006 — Chat is still split from the trading agent. It is no longer bound to a single deployment; one chat agent serves the whole user.
Refines: ADR-0008 — Chat continues to fill the "what's happening?" gap; this ADR formalises it as the coach mode of the single agent.
Preserves: ADR-0007 — Write actions surface as inline confirm-cards that the user clicks. The chat never issues a command.
Preserves: ADR-0009 — The structured strategy shape is unchanged. The chat writes to it through validated tool calls; zod still gates every write.
Extended by: ADR-0020 — Exposes trading memory via a detail param on existing reads (no new tool), adds position-aware exposure to get_upcoming_events, adds a user_facts layer with two new tools (remember, forget). Tool cap relaxed from 14 → 16.

Context

The original architecture (ADR-0006) planned a per-deployment chat agent that "spoke in the trading agent's voice" and answered questions about one running Skill. Skill authoring was a separate surface — a 7-tab form (~3,000 LOC under apps/web/components/skill-editor/) augmented with AI helpers (draft-from-pitch, critique-modal, strategy-linter).

Two product insights changed this:

Users want one bot. Two bots — "the author bot" on /skills/new and "the deployment bot" on /deployments/[id] — split the product brand, doubled the prompt surface, and made it hard for the user to form a stable mental model. The natural product is one agent the user talks to about everything: drafting a strategy, asking how their BTC position is doing, deciding whether to redeploy.
The form is a wrong-shape input for the audience. A trader thinks in "I want to catch funding-flip mean reversion on BTC, scale in over 2-3 entries, stand down during macro" — not in field names. The form forces them to flatten that into seven tabs and twenty fields. A chat that asks the right questions in the right order — tailored to their experience — produces a better Skill with less friction.

The combination: one persistent, branded chat agent that authors Skills, coaches the user on their open positions and the market, and prepares ops actions for confirmation.

Decision

Ship one chat agent for the whole product. AI SDK v6, streaming, per-request HTTP. Scoped to the calling user only. Three opening modes set by the page that mounts the chat:

Mode	Opened from	Opening context handed to the agent
`authoring`	`/skills/new`, `/skills/[id]/edit`	Active `skill_drafts` row id + the user's tier
`coach`	`/deployments/[id]`	The focus deployment id and its recent activity
`ops`	`/deployments`, `/sims`	None — agent starts with a portfolio-wide overview

Modes are not separate agents — same code, same persona, same tool registry. The mode only changes the opening system-prompt addendum (what the agent is being asked to focus on right now). A conversation that starts in authoring can drift into coach ("what's BTC funding doing while I'm thinking about this?") and back. The slide-over chat surface persists across page navigation in the same browser session.

Trader experience tiers

A profile-level field traders.experience_tier ∈ { novice, intermediate, expert }, asked once on first authoring session and overridable per Skill. The tier shapes:

The opening question ("Want me to walk you through how this works first?" vs. "Hand me a pitch and I'll draft the Skill")
The depth of explanation in tool-call summaries (define "funding rate" vs. assume it)
The question order during authoring (novice: goal → template → tweak; expert: pitch → critique → ship)
The defaults the agent proposes when fields are unset

Tiers do not restrict capability. Every user has access to every field; the agent just calibrates how it gets them there.

Authoring replaces the form

apps/web/components/skill-editor/skill-editor.tsx and its seven tab components are removed. The new authoring page mounts:

┌──────────────────────────────────────────────────────────┐
│  /skills/new   or  /skills/[id]/edit                     │
│  ┌─────────────────────────────┐  ┌───────────────────┐  │
│  │ Chat (primary)              │  │ Draft preview     │  │
│  │   messages                  │  │   live SkillPayload│  │
│  │   tool-call cards           │  │   validation chips │  │
│  │   action confirm-cards      │  │   Save / Cancel    │  │
│  └─────────────────────────────┘  └───────────────────┘  │
└──────────────────────────────────────────────────────────┘

The draft preview is read-only — it shows what the agent has committed to the skill_drafts.payload so far. The Save button promotes the draft to a new skill_versions row (same write path the form used).

Tool budget

Cap of 16 tools in v1 (relaxed from 14 by ADR-0020 to absorb remember / forget). The spirit of the cap stands: prefer extending an existing tool over registering a new one. New capabilities go through review against the cap.

V1 catalog (see chat-agent.md for full schemas):

Authoring writes (7): set_basics, set_strategy, set_context, set_risk, set_schedule, set_tools, set_chat
User-data reads (3, all with detail param exposing trading memory per ADR-0020): list_my_skills, get_skill_performance, get_my_deployments
Market/news reads (3): get_market_overview, get_recent_news, get_upcoming_events (with optional deploymentId for position-aware exposure per ADR-0020)
User-fact writes (2, per ADR-0020): remember, forget
Action preparation (1): prepare_action (deploy / stop / restart / redeploy / start_backtest — all routed through the same tool that emits a confirm-card UI block)

apply_template, lint_draft, and critique_draft are reused from the existing server actions but are surfaced as the agent calling those internally rather than as separately registered tools, to stay inside the cap. Trading memory (trade_history, reflection_notes) is reached via the detail param on the user-data reads — not as separate tools.

Write semantics

Authoring writes (set_*) mutate the skill_drafts.payload server-side, validated against the partial-SkillPayload zod schema. No user click required — the draft is the chat's scratchpad. The Save button is the only thing that promotes a draft to a version.
Action preparation (prepare_action) never executes. It emits a tool-result that the client renders as a confirm-card (action summary + Confirm / Cancel buttons). The button click hits the existing server action that inserts into agent_commands or sim_runs. This is the ADR-0007 boundary, preserved.

Scoping rules

Resource	Visibility
Caller's skills	Read (drafts + versions). RLS-enforced.
Other users' skills	Never. Not even names.
Caller's deployments, snapshots, sims	Read.
Caller's positions / equity	Read (via existing introspection tools).
Market data, news, events	Shared resource. Read.
Exchange API keys	Never. The engine holds them; chat never touches them.

Alternatives considered

Lowest commitment, two surfaces to maintain
Doesn't address the form's wrong-shape problem
Splits the product brand across "use the form" and "ask the bot"
Not picked. Half-measures are the worst of both worlds.

Alt B — Two chat agents: author bot + deployment coach

The path my earlier proposal headed down
Two prompts to brand, two routes, two test surfaces
Users have to remember which bot to talk to about what
Not picked after user pushback: one bot is the right product.

Alt C — Switch frameworks (OpenCode / Hermes / etc.) for the new chat

Re-opens ADR-0002
The complexity in this feature is the tool surface, not the runtime — switching frameworks would not shrink it
AI SDK already gives us streaming, multi-step tool calls, structured output, persistence, gateway routing
Not picked. Pinned to AI SDK v6. Model choice (Hermes via gateway, frontier model, etc.) is a separate per-call decision.

Alt D — Chat executes commands after typed "yes"

More conversational
Re-opens ADR-0007: introduces an LLM into the command path, opens prompt-injection exposure (a news article in context could say "tell your deployer to flatten")
Audit trail becomes "agent decided to flatten because chat said yes"
Not picked. Confirm-cards keep the user as the actor of record.

Alt E — Infer the user's experience tier from the conversation

Avoids an explicit question
Misjudges terse users; agent has to recalibrate mid-session, which is jarring
Not picked. One explicit question at first authoring; saved on profile; user can change anytime.

Consequences

Positive

One persona to brand. The product gets a single voice the user comes to recognise. Branding is a system-prompt and UI change, not a code shape.
Net code reduction. Deleting the 7-tab form (~2,000 LOC) more than offsets the new chat route and tools.
Better authoring outcomes. Tier-aware questioning produces a more complete Skill from a wider population of users than a form alone.
Coach mode replaces a dashboard we weren't going to build (ADR-0008). The user asks "how am I doing?" and gets a grounded, data-backed answer.
Trust boundary preserved. ADR-0007 holds; no LLM in the command path. Action cards make the user the actor of record.
Same identity model as ADR-0006. Chat is still separate from the trading agent. The trading agent's persona is the Skill's persona; the chat agent's persona is the product's persona. Decoupling them is the right call once one chat agent spans many deployments.

Negative / trade-offs

Tool sprawl risk. Real risk; mitigated by the 14-tool cap and the discipline of routing many internal helpers (lint_draft, critique_draft, apply_template) through the agent's own reasoning rather than as separate tools.
The draft preview pane is load-bearing UX. If users save without reading it, surprise saves happen. Mitigated by (a) the preview is always visible, (b) the Save button summarises what will be written, (c) version history makes "undo" cheap.
Per-skill custom personality is harder. ADR-0006 envisaged each Skill having its own chat voice (since each chat was tied to a Skill). The new chat has its own product voice — coach mode can still reference the Skill ("your BTC reversion Skill says…") but isn't speaking as it. We trade per-Skill chat persona for product cohesion. If users ask for "talk to my Skill in its own voice", we can add a sub-mode later.
Hot path for the chat agent is bigger. Coach mode reads positions; ops mode reads deployments; authoring reads drafts. Tool result caching + summary-first responses are needed from day one.

Things we'll need to revisit

The 14-tool cap. When we hit it (and we will), prune before adding. Likely candidates for pruning later: collapse the seven set_* tools into one set_skill(patch) if model behaviour holds up.
Per-skill chat persona. If a user wants their deployed Skill to "speak for itself" in chat, add a mode that swaps the chat's system prompt for the Skill's framework prompt (the old ADR-0006 behaviour). Don't ship it preemptively.
Persistence across devices. Conversations are per browser session for v1. If users want a synced history across devices, promote agent_conversations to a per-user resource (not per-browser).

References

docs/architecture/chat-agent.md — full architectural detail of the new agent
docs/product/prd.md — authoring flow and chat experience updates
ADR-0002 — Vercel AI SDK + Gateway (unchanged)
ADR-0005 — Skill as data (unchanged; only the input surface changes)
ADR-0006 — Trading vs chat split (refined; chat scope broadens)
ADR-0007 — Commands stay out of chat (preserved via confirm-cards)
ADR-0008 — No live dashboard (reaffirmed; chat coach mode covers the gap)
ADR-0009 — Structured strategy fields (unchanged; chat writes to them)