Memory layer
Package:
packages/agent-runtime/src/memory.ts,packages/agent-runtime/src/materializer.ts,packages/agent-runtime/src/reflection.tsCompanion ADR: ADR-0017 DB:public.trade_history,public.reflection_notes
Responsibility
Give the trading agent a deterministic, bounded view of its own track record so it can stop re-making the same mistakes — without growing the per-tick prompt as history accumulates and without adding any new authority over the broker.
The layer is pre-loaded, not queried. assembleContext() injects history into the user message the same way it injects bars/news/portfolio. The agent has no query_trade_history tool; the chat agent does (read-only). Rationale lives in ADR-0017.
Two artifacts, three phases
| Artifact | What it is | Refresh cadence | Per-tick cost |
|---|---|---|---|
| Rolling-window risk-state slot | Deterministic aggregate of in-session activity: rejections by rule (last 1h), executed orders today (UTC), intraday realized PnL%, consecutive losing closes, recent-rejection-codes tail | Computed every tick from decision_snapshots + agent_state + trade_history (live) or the in-memory engine-tick ring (sim) | ~80 tok when populated |
trade_history | Structured ledger; one row per round-trip trade with entry/exit/PnL/MFE/MAE/reasons | Materialized every tick from position deltas | ~30 tok/row × K |
reflection_notes | ≤2000-char "lessons" string distilled from recent trades, structured into five named slots (entry timing, sizing, symbol selection, exit timing, regime fit) | Per-N-trades or daily; one active row per deployment | ~500 tok when present |
| (Phase 3, deferred) embeddings | pgvector index over (context summary, outcome) for k-NN recall | Per-tick on insert | ~500 tok when present |
The phases are layered for time horizon, not features. Phase 1 = "what did I just do?" Phase 2 = "what have I learned over weeks?" Phase 3 = "have I seen this specific situation before?" Each holds a fixed token budget regardless of how big the underlying tables get.
Data model
trade_history
One row per round-trip trade. A trade is opened the first tick the agent holds a non-zero size in a symbol; closed the tick that size returns to zero (or flips sign — see "flips" below). PnL, MFE/MAE, and holding time are filled at close. Rows are linked back to the producing decision_snapshots rows so the firehose stays authoritative.
id uuid PK
deployment_id uuid → deployments.id (cascade)
skill_id, version → skill_versions (cascade by skill_id)
symbol text
side 'long' | 'short'
status 'open' | 'closed'
-- entry
entry_tick_at timestamptz
entry_price numeric -- mark at the entering tick
entry_size_usd numeric -- notional at entry
entry_leverage numeric | null
entry_reason text -- propose_order.reason
entry_snapshot_id uuid → decision_snapshots.id (set null on delete)
entry_lessons_hash text | null -- A/B telemetry; joins to reflection_notes.content_hash
entry_regime_tag text | null -- classifyRegime() output at entry; null = "not classified"
-- exit (null while status='open')
exit_tick_at timestamptz | null
exit_price numeric | null
exit_reason text | null
exit_snapshot_id uuid → decision_snapshots.id (set null on delete)
-- outcome
holding_minutes numeric | null
realized_pnl_usd numeric | null
fees_usd numeric -- default 0; live broker fills this from broker logger
mfe_usd, mae_usd numeric | null -- $ best/worst tick mark vs entry, until close
-- bookkeeping
created_at, updated_atIndices: (deployment_id, entry_tick_at desc) and (skill_id, skill_version, entry_tick_at desc). RLS: owner read; service-role write.
reflection_notes
id uuid PK
skill_id, version → skill_versions
deployment_id uuid | null → deployments (null = skill-scoped, manually promoted)
status 'active' | 'superseded'
generated_at timestamptz
window_start, end timestamptz -- the trade range distilled
trades_considered int
lessons_text text (≤2000 chars)
model_used text
input_tokens, output_tokens, cost_usd
content_hash text | null -- hashLessonsText(lessons_text); joins to trade_history.entry_lessons_hashPartial unique index where status = 'active' on deployment_id enforces exactly one active note per deployment. Older notes flip to superseded for audit/history.
Materializer state machine
Lives in packages/agent-runtime/src/materializer.ts as a pure function reconcilePortfolioDelta(prev, next, ctx) → MaterializerOp[]. Same function consumed from apps/live-runner/src/tick.ts post-tick and packages/simulator/src/backtest.ts post-tick.
Inputs: previous portfolio snapshot, next portfolio snapshot (after broker fills for this tick), the tick's proposed_action.reason (for entry/exit reason imprinting), and the decision_snapshots.id of the producing tick.
Outputs: zero or more typed ops the caller applies to trade_history:
OpenTrade { symbol, side, entry_price, entry_size_usd, entry_leverage, entry_reason, entry_snapshot_id }
UpdateMfeMae { symbol, mfe_usd, mae_usd } -- runs every tick while a position is open
CloseTrade { symbol, exit_price, exit_reason, exit_snapshot_id, realized_pnl_usd, holding_minutes }
FlipTrade { CloseTrade-of-prior-side, OpenTrade-of-new-side }Transitions, by (prev, next) per symbol:
| prev | next | op(s) |
|---|---|---|
| absent | open (size > 0) | OpenTrade |
| open | absent (size = 0) | CloseTrade |
| open | open same side, same size | UpdateMfeMae only |
| open | open same side, different size | UpdateMfeMae; we DO NOT split into two trade rows on add-to. The trade row tracks the position's lifetime; partial closes/adds are recorded as a single round-trip with the entry that started it. (Simpler ledger; matches how traders think about "the trade.") |
| open long | open short (flip) | CloseTrade (long) + OpenTrade (short). Treated as two trades because the agent's reason changed and the PnL boundary is unambiguous. |
| open short | open long (flip) | mirror of above |
MFE/MAE. First implementation walks the difference between current mark and entry price each tick the trade is open:
- long:
mfe_usd = max(prev_mfe, size_base × (markPrice − entry_price)),mae_usd = min(prev_mae, …)(signed; we store positive MFE and negative MAE, naming is a hint not a constraint). - short: signs reversed.
We approximate using tick marks, not sub-bar highs/lows. If sub-bar whipsaw becomes a recurring complaint we upgrade to per-bar high/low (paper broker has the data; mainnet adapter needs a fills-window pull). Documented in the ADR's "things to revisit" section.
Liquidation. When a position disappears from next without the agent having proposed close_position or adjust_position, the materializer still writes a CloseTrade — the exit reason is set to "liquidated" to make the post-mortem obvious. Detection: the engine result for this tick is not executed/close_position-driven AND the prior position's size > 0. The broker is the source of truth that the position is gone; the materializer just labels it correctly.
Idempotency. The materializer is pure; the caller writes ops via service-role client INSERT/UPDATE. To survive a runner crash mid-write:
OpenTradeupserts on(deployment_id, symbol, status='open')— at most one open row per (deployment, symbol).UpdateMfeMaeupdates that row.CloseTradeflips it tostatus='closed'.FlipTraderuns CloseTrade then OpenTrade in one transaction.
A re-run of the same tick is a no-op (or, for MFE/MAE, idempotent extreme math).
Why position-delta, not order-fill, materialization
Two reasons:
- Broker-agnostic. Paper and Hyperliquid mainnet diverge on fill semantics (paper fills synchronously; mainnet has resting orders, partial fills, exchange fees). The
BrokerAdapter.snapshot()contract is the same for both: positions and equity after the broker's view of fills has settled. Materializing from position deltas means the same code works in sim, paper, and live without per-broker conditionals. - ADR-0014. Broker-authoritative state. The materializer reads the broker's word on positions and computes what changed; it does not introspect order flow.
fees_usd is the one exception — when the live broker logger pushes fee events through recordOrderFilled, the runner increments trade_history.fees_usd on the open trade row for the same symbol. Paper broker's fees come through the engine result placeOrderResponse.fill.feesUsd.
Regime tagging and A/B telemetry
Every trade_history row stamps two columns at OPEN time:
-
entry_regime_tag— a six-bucket classification of the entry-tick bar window, fromclassifyRegime(bars)inpackages/agent-runtime/src/regime.ts. Pure function over a single bar window; no I/O. Two axes:- Trend.
last_close / first_close − 1exceeding±0.5%⇒trend_up/trend_down; elsechop. Threshold configurable per call but defaults are tuned for the 5m / 100-bar default skill window. - Vol. Std-dev of bar-to-bar log returns ≥
0.4%⇒hivol; elselowvol. Same default tuning. - Result is the dash-joined tag, e.g.
trend_up_lowvol,chop_hivol.unknownwhen the bar window is too small (< 12) or the prices are non-finite.
The runner / sim only classifies symbols that transition open this tick — symbols already on the books keep the tag they were stamped with at their original entry. Classification failures land as
null(honest under-reporting); the tag does not get fabricated.Surfaced to the agent inline in the
## Recent tradessection (regime=trend_up_lowvol) and the reflection prompt's user-message trade summary, with the### Regime fitslot explicitly inviting per-regime conditioning. - Trend.
-
entry_lessons_hash—hashLessonsText(activeLessons)at the entry tick. 16-char SHA-256 prefix; whitespace-normalised so live + sim writers produce identical hashes for the same content. Pairs withreflection_notes.content_hash(populated at insert) to form a join key:-- Was lessons revision A better than B? select rn.content_hash, avg(th.realized_pnl_usd) as avg_pnl, count(*) as n from reflection_notes rn join trade_history th on th.entry_lessons_hash = rn.content_hash and th.deployment_id = rn.deployment_id where rn.skill_id = $1 and th.status = 'closed' group by rn.content_hash;Without this, "memory helps" is a faith claim. With it, every closed trade is implicitly labelled with the lessons revision it was made under — no extra instrumentation at decision time.
Rolling-window risk-state slot
A deterministic aggregate of this session's execution and risk-engine activity, rendered as ## Recent activity (this session) between ## Last decision and the closed-trade ledger. Distinct from lastDecision (which is one row) and from the closed-trade ledger (which is multi-day): this is the in-session pattern view.
## Recent activity (this session)
- engine rejections (last 1h): R3_POSITION_CAP=4, R6_RATE_LIMIT=1
- recent rejection sequence: R3_POSITION_CAP → R3_POSITION_CAP → R3_POSITION_CAP → R3_POSITION_CAP ← same code in the last 3+ rejections; do not repeat the same proposal shape
- executed orders today (UTC): 2
- realized PnL today (UTC, vs day-start equity): -1.23%
- consecutive losing closes: 3 (consider whether your read of the regime still holds)Counts and lines that are zero or absent are omitted entirely; if everything is zero (first tick of a new deployment), the whole section is suppressed. Bounded under all settings: ~80 tokens fully populated.
The repetition-flag heuristic is intentional: when the last three rejection codes are the same, we emit an explicit "do not repeat the same proposal shape" nudge. This targets the most common failure mode the slot exists to interrupt — the model proposing variations of the same oversized order over and over because each tick it only sees the most recent single rejection. Three is the smallest number that's clearly not noise; the formatter never accuses on two.
Data sources:
- Live (
apps/live-runner/src/memory-client.ts): five parallel reads —decision_snapshotsfor hourly rejections + today's executed count,trade_historyfor today's realized PnL + losing streak,agent_stateforday_start_equity. Each section's query failure logs and returns its empty default; the layer never takes a tick down. - Sim (
packages/simulator/src/memory.ts): an in-memoryengineTailring of the last 5,000 tick outcomes plusdayStartEquitycaptured on UTC midnight crossings. The backtest's tick loop callsledger.recordTickOutcome({ engineKind, engineRule, equityUsd })after each engine result so the next tick's slot reflects this one.
Context injection
assembleContext() (packages/agent-runtime/src/context.ts) gains two new sections when skill.context.memory.enabled === true:
## Recent trades on this skill (closed)
- 2026-06-04T10:00 → 12:30 BTC long $250 65,200 → 65,940 +$28.40 (+1.1%) 120m "breakout above prior swing high"
- 2026-06-04T08:15 → 09:00 ETH short $180 3,420 → 3,455 -$9.70 (-0.5%) 45m "funding extreme, mean-revert"
...
## Open positions (memory view)
- ETH long $200 @ 3,410 mark=3,438 MFE=+$22 / MAE=-$8 held 30m "BB lower-band bounce"The format is deterministic: closed trades are sorted entry-DESC, capped at memory.recentTradesK. Each row is ~30 tokens. Open positions duplicate fields already in the ## Portfolio section but add memory-only context (MFE/MAE, holding time, original entry reason) — the portfolio section keeps its terse format for risk decisions; this section is for learning context.
Memory loading is a single batched read on trade_history per tick, joined to no other table — it costs a few milliseconds. The runner provides a MemoryClient on ToolContext (see API below); the simulator provides the same interface backed by an in-memory ledger that the post-tick materializer writes to.
Prompt segment
composeSystemPromptSegments() (packages/prompt-compose/src/compose.ts) gains a fifth segment:
export type SystemPromptSegments = {
header: string;
leash: string;
strategy: string;
lessons: string | null; // NEW — present only when an active reflection note exists
footer: string;
};Rendered between strategy and footer:
Lessons from your recent trades on this skill (auto-generated; signal, not strategy):
- <bullet>
- <bullet>
...The "signal, not strategy" framing is load-bearing. Hard avoid rules remain authoritative; lessons are reference. This mirrors ADR-0012's leash discipline — the trader's avoid text is the only ground truth for "never do X."
The lessons text is passed through composeSystemPrompt() at runtime by the live runner / sim worker via a new lessons?: string argument. The editor preview pane shows a placeholder lessons block by default so the "what the agent sees" panel still demonstrates the slot.
Reflection job
packages/agent-runtime/src/reflection.ts exports runReflection({ skill, trades, modelOverride? }) → Promise<ReflectionResult>. Pure logic, no DB writes. The caller persists the row.
Trigger (shouldRunReflection(state, skill)):
cadence === 'off'→ nevercadence === 'per_n_trades'→ run whenclosedTradesSinceLastReflection % everyNTrades === 0and the latest closed trade just transitioned this tickcadence === 'daily'→ run when the current UTC date differs from the last active note'sgenerated_at::date
Inside apps/live-runner/src/tick.ts, the trigger runs after persistTick has written any new trade_history rows for this tick. Reflection runs synchronously on the runner machine — Haiku at the default takes ~1s. We accept the latency hit on the rare ticks that trigger; the alternative (a separate worker) introduces ordering hazards with no real win.
Prompt — structured slots (reflection.ts:buildReflectionSystemPrompt):
The reflection output is forced into five named slots, each with at most 4 bullets:
- Entry timing — when entries are working or failing vs. the strategy's setup criteria.
- Sizing — whether typical $ size and leverage match the realized edge.
- Symbol selection — which symbols are contributing PnL vs. dragging.
- Exit timing — captured-MFE ratio; cutting too early vs. leaving PnL on the table.
- Regime fit — whether the strategy's assumptions held in the period observed.
Why slots instead of freeform bullets:
- Forces multi-axis attention. Without slots the model homes in on the one most visible failure and ignores the rest. With slots an empty bucket is explicit —
(no clear pattern yet)— instead of accidentally silent. - Ablatable. A future pass can A/B suppress individual slots and measure which actually shifts behaviour. With freeform text you can't.
- Slows accretion. The 4-bullet-per-slot cap is a structural ceiling on the "lessons grow forever" failure mode.
The prompt explicitly tells the model that empty slots are valid: "A slot with no clear pattern in the data: write exactly (no clear pattern yet) on its own line under the heading. Do NOT invent a lesson to fill the slot." Hard constraints (no contradicting avoid rules, no invented rules, every bullet grounded in numbers) are restated alongside the format.
The user message wraps a structured trade summary + the active strategy snippet + the existing active lessons (so reflection is incremental, not from-scratch each time).
Output validation:
- Trim to ≤2000 chars.
- Reject (and skip the run, logging a warning) if the output is empty, exceeds 4000 chars (model went off-rails), or contains literal text matching the trader's
avoidrules verbatim (a paranoid linter; cheap to add, easy to disable if it ever bites). - Persist as a new
reflection_notesrow withstatus='active'; in the same transaction, flip the prior active row (if any) tostatus='superseded'.
ToolContext additions
// packages/tools/src/types.ts
export type MemoryClient = {
/** Closed-trade rows newest-first, capped at limit. */
recentClosed(args: { limit: number }): Promise<ClosedTradeMemoryRow[]>;
/** Open-trade rows with MFE/MAE attached. */
openWithMfeMae(): Promise<OpenTradeMemoryRow[]>;
/** The active lessons text, or null. */
activeLessons(): Promise<string | null>;
};
export type ToolContext = {
// …existing fields…
memory: MemoryClient;
};Provided by:
- Live runner:
buildMemoryClient(deploymentId, supabase)→ readstrade_historyandreflection_notesvia service-role. - Simulator: in-process
InMemoryMemoryClientbacked by the same materializer ops the post-tick step generates. - Chat agent: passes the same shape but builds its tool surface from it (gives the agent
query_trade_historydirectly, since interactive latency is fine and there's no sim/live parity contract — see ADR-0006).
Sim / live parity
The five things that keep sim a faithful preview of live, given memory:
- Materializer is shared code. Same function, same inputs, same ops.
- Memory reads are deterministic.
recentClosed({ limit: K })returns the same rows in sim and live for the same trade sequence. - Reflection in sim is opt-in. Backtests have
memory.reflection.cadencehonored, but the default sim CLI sets it to'off'to keep deterministic runs cheap. The trader can enable it explicitly to test "what does the agent learn over this 7-day backtest?" — and gets a non-deterministic run (Haiku output varies tick-to-tick). - No clock drift in MFE/MAE. Both runtimes pass the broker's mark price at the post-tick reconcile moment. The materializer does not call
Date.now(). - Lessons text is content-addressed. Each
reflection_notesrow carries(window_start, window_end, trades_considered)— replaying the same backtest produces the same set of generated rows in the same order.
Failure modes
| Failure | Detection | Recovery |
|---|---|---|
| Materializer skips a tick (runner crash mid-write) | Next tick reconciles prev=broker(now-1) vs next=broker(now); the missed trade transition is detected on the next reconcile | Idempotent ops + upsert on (deployment_id, symbol, status='open') |
| Position appears/disappears outside the agent's proposal (oracle delisting, manual flatten command) | decision_snapshots for this tick shows engine.kind != 'executed' but position changed | Materializer labels exit_reason as "external_flatten" if the close was driven by a flatten command, "liquidated" otherwise. Logged at info. |
| Reflection job rate-limited or model errors | Caller catches the runReflection rejection | Log at warn; the prior active note stays active; retry on next trigger event |
| Lessons text exceeds 2000-char DB CHECK | Caller pre-truncates before insert | Hard cap at the application layer; never round-trip a too-long lessons through the DB |
| Stale prior active note (skill deleted, deployment stopped) | RLS still scopes reads to owner; cascade deletes drop notes when deployment goes away | No special handling needed |
Token budget
At default settings (recentTradesK=10, reflection enabled with an active note):
| Section | Approx tokens |
|---|---|
| System header | ~600 |
| Leash | ~100 |
| Strategy (thesis mode, ~3 paragraphs) | ~600 |
| Lessons (when active) | ~500 |
| Footer | ~70 |
| User: time + bars (100 × 5m) | ~1,600 |
| User: news + funding + OI | ~450 |
| User: portfolio + risk caps | ~350 |
| User: last decision | ~100 |
| User: recent activity (this session) | ~80 |
| User: recent trades (K=10) | ~300 |
| User: open positions memory view | ~150 |
| Total typical tick | ~4,900 |
| Total without memory | ~3,870 |
Memory adds ~27% at defaults. The "what the agent sees" preview shows a live token estimate so the editor can warn at high settings.
Bounded under all skill configurations:
recentTradesKcapped at 30 in the schema → worst-case +600 tok memory ledger.lessons_textcapped at 2000 chars → worst-case +500 tok lessons.
So even with knobs cranked: ~1,100 tok of memory per tick. The DB grows linearly with trade count; the prompt does not.
What's not in this layer
- Cross-deployment memory. A deployment-scoped active note covers most cases. Skill-scoped notes (
deployment_id = null) are allowed by the schema but only get written by an explicit trader-curated promotion — out of MVP scope. - Semantic recall (Phase 3). Deferred. Substrate (
vectorextension) is on; no embedding pipeline shipped here. - Agent-editable memory. The trading agent reads memory; it cannot write or correct it. The chat agent (ADR-0006) is read-only too. Any user-driven curation lives in the editor UI as a future pass.
- Backfill from existing decision snapshots. When this layer ships, deployments running before the migration get an empty ledger; their next round-trip trades populate it forward. A backfill script over
decision_snapshots+mainnet_ordersis left for ops.
References
- ADR-0017 — decision record this doc realizes
- ADR-0006 — trading vs. chat split; chat owns the memory query tools
- ADR-0012 — segmented system prompt this layer extends with
lessons - ADR-0014 — broker-authoritative state contract the materializer reads from
packages/agent-runtime/src/context.ts— where the new sections renderpackages/agent-runtime/src/materializer.ts— the pure materializerpackages/agent-runtime/src/reflection.ts— pure reflection runnerpackages/prompt-compose/src/compose.ts— composedlessonssegment