Memory layer

Package: packages/agent-runtime/src/memory.ts, packages/agent-runtime/src/materializer.ts, packages/agent-runtime/src/reflection.ts Companion ADR: ADR-0017 DB: public.trade_history, public.reflection_notes

Responsibility

Give the trading agent a deterministic, bounded view of its own track record so it can stop re-making the same mistakes — without growing the per-tick prompt as history accumulates and without adding any new authority over the broker.

The layer is pre-loaded, not queried. assembleContext() injects history into the user message the same way it injects bars/news/portfolio. The agent has no query_trade_history tool; the chat agent does (read-only). Rationale lives in ADR-0017.

Two artifacts, three phases

Artifact	What it is	Refresh cadence	Per-tick cost
Rolling-window risk-state slot	Deterministic aggregate of in-session activity: rejections by rule (last 1h), executed orders today (UTC), intraday realized PnL%, consecutive losing closes, recent-rejection-codes tail	Computed every tick from `decision_snapshots` + `agent_state` + `trade_history` (live) or the in-memory engine-tick ring (sim)	~80 tok when populated
`trade_history`	Structured ledger; one row per round-trip trade with entry/exit/PnL/MFE/MAE/reasons	Materialized every tick from position deltas	~30 tok/row × K
`reflection_notes`	≤2000-char "lessons" string distilled from recent trades, structured into five named slots (entry timing, sizing, symbol selection, exit timing, regime fit)	Per-N-trades or daily; one `active` row per deployment	~500 tok when present
(Phase 3, deferred) embeddings	pgvector index over (context summary, outcome) for k-NN recall	Per-tick on insert	~500 tok when present

The phases are layered for time horizon, not features. Phase 1 = "what did I just do?" Phase 2 = "what have I learned over weeks?" Phase 3 = "have I seen this specific situation before?" Each holds a fixed token budget regardless of how big the underlying tables get.

Data model

`trade_history`

One row per round-trip trade. A trade is opened the first tick the agent holds a non-zero size in a symbol; closed the tick that size returns to zero (or flips sign — see "flips" below). PnL, MFE/MAE, and holding time are filled at close. Rows are linked back to the producing decision_snapshots rows so the firehose stays authoritative.

id                  uuid PK
deployment_id       uuid → deployments.id   (cascade)
skill_id, version   → skill_versions  (cascade by skill_id)
symbol              text
side                'long' | 'short'
status              'open' | 'closed'
-- entry
entry_tick_at       timestamptz
entry_price         numeric           -- mark at the entering tick
entry_size_usd      numeric           -- notional at entry
entry_leverage      numeric | null
entry_reason        text              -- propose_order.reason
entry_snapshot_id   uuid → decision_snapshots.id  (set null on delete)
entry_lessons_hash  text | null       -- A/B telemetry; joins to reflection_notes.content_hash
entry_regime_tag    text | null       -- classifyRegime() output at entry; null = "not classified"
-- exit (null while status='open')
exit_tick_at        timestamptz | null
exit_price          numeric | null
exit_reason         text | null
exit_snapshot_id    uuid → decision_snapshots.id (set null on delete)
-- outcome
holding_minutes     numeric | null
realized_pnl_usd    numeric | null
fees_usd            numeric           -- default 0; live broker fills this from broker logger
mfe_usd, mae_usd    numeric | null    -- $ best/worst tick mark vs entry, until close
-- bookkeeping
created_at, updated_at

Indices: (deployment_id, entry_tick_at desc) and (skill_id, skill_version, entry_tick_at desc). RLS: owner read; service-role write.

`reflection_notes`

id                  uuid PK
skill_id, version   → skill_versions
deployment_id       uuid | null → deployments  (null = skill-scoped, manually promoted)
status              'active' | 'superseded'
generated_at        timestamptz
window_start, end   timestamptz       -- the trade range distilled
trades_considered   int
lessons_text        text (≤2000 chars)
model_used          text
input_tokens, output_tokens, cost_usd
content_hash        text | null       -- hashLessonsText(lessons_text); joins to trade_history.entry_lessons_hash

Partial unique index where status = 'active' on deployment_id enforces exactly one active note per deployment. Older notes flip to superseded for audit/history.

Materializer state machine

Lives in packages/agent-runtime/src/materializer.ts as a pure function reconcilePortfolioDelta(prev, next, ctx) → MaterializerOp[]. Same function consumed from apps/live-runner/src/tick.ts post-tick and packages/simulator/src/backtest.ts post-tick.

Inputs: previous portfolio snapshot, next portfolio snapshot (after broker fills for this tick), the tick's proposed_action.reason (for entry/exit reason imprinting), and the decision_snapshots.id of the producing tick.

Outputs: zero or more typed ops the caller applies to trade_history:

OpenTrade        { symbol, side, entry_price, entry_size_usd, entry_leverage, entry_reason, entry_snapshot_id }
UpdateMfeMae     { symbol, mfe_usd, mae_usd }      -- runs every tick while a position is open
CloseTrade       { symbol, exit_price, exit_reason, exit_snapshot_id, realized_pnl_usd, holding_minutes }
FlipTrade        { CloseTrade-of-prior-side, OpenTrade-of-new-side }

Transitions, by (prev, next) per symbol:

prev	next	op(s)
absent	open (size > 0)	`OpenTrade`
open	absent (size = 0)	`CloseTrade`
open	open same side, same size	`UpdateMfeMae` only
open	open same side, different size	`UpdateMfeMae`; we DO NOT split into two trade rows on add-to. The trade row tracks the position's lifetime; partial closes/adds are recorded as a single round-trip with the entry that started it. (Simpler ledger; matches how traders think about "the trade.")
open long	open short (flip)	`CloseTrade` (long) + `OpenTrade` (short). Treated as two trades because the agent's reason changed and the PnL boundary is unambiguous.
open short	open long (flip)	mirror of above

MFE/MAE. First implementation walks the difference between current mark and entry price each tick the trade is open:

long: mfe_usd = max(prev_mfe, size_base × (markPrice − entry_price)), mae_usd = min(prev_mae, …) (signed; we store positive MFE and negative MAE, naming is a hint not a constraint).
short: signs reversed.

We approximate using tick marks, not sub-bar highs/lows. If sub-bar whipsaw becomes a recurring complaint we upgrade to per-bar high/low (paper broker has the data; mainnet adapter needs a fills-window pull). Documented in the ADR's "things to revisit" section.

Liquidation. When a position disappears from next without the agent having proposed close_position or adjust_position, the materializer still writes a CloseTrade — the exit reason is set to "liquidated" to make the post-mortem obvious. Detection: the engine result for this tick is not executed/close_position-driven AND the prior position's size > 0. The broker is the source of truth that the position is gone; the materializer just labels it correctly.

Idempotency. The materializer is pure; the caller writes ops via service-role client INSERT/UPDATE. To survive a runner crash mid-write:

OpenTrade upserts on (deployment_id, symbol, status='open') — at most one open row per (deployment, symbol).
UpdateMfeMae updates that row.
CloseTrade flips it to status='closed'.
FlipTrade runs CloseTrade then OpenTrade in one transaction.

A re-run of the same tick is a no-op (or, for MFE/MAE, idempotent extreme math).

Why position-delta, not order-fill, materialization

Two reasons:

Broker-agnostic. Paper and Hyperliquid mainnet diverge on fill semantics (paper fills synchronously; mainnet has resting orders, partial fills, exchange fees). The BrokerAdapter.snapshot() contract is the same for both: positions and equity after the broker's view of fills has settled. Materializing from position deltas means the same code works in sim, paper, and live without per-broker conditionals.
ADR-0014. Broker-authoritative state. The materializer reads the broker's word on positions and computes what changed; it does not introspect order flow.

fees_usd is the one exception — when the live broker logger pushes fee events through recordOrderFilled, the runner increments trade_history.fees_usd on the open trade row for the same symbol. Paper broker's fees come through the engine result placeOrderResponse.fill.feesUsd.

Regime tagging and A/B telemetry

Every trade_history row stamps two columns at OPEN time:

entry_regime_tag — a six-bucket classification of the entry-tick bar window, from classifyRegime(bars) in packages/agent-runtime/src/regime.ts. Pure function over a single bar window; no I/O. Two axes:
- Trend. last_close / first_close − 1 exceeding ±0.5% ⇒ trend_up / trend_down; else chop. Threshold configurable per call but defaults are tuned for the 5m / 100-bar default skill window.
- Vol. Std-dev of bar-to-bar log returns ≥ 0.4% ⇒ hivol; else lowvol. Same default tuning.
- Result is the dash-joined tag, e.g. trend_up_lowvol, chop_hivol. unknown when the bar window is too small (< 12) or the prices are non-finite.
The runner / sim only classifies symbols that transition open this tick — symbols already on the books keep the tag they were stamped with at their original entry. Classification failures land as null (honest under-reporting); the tag does not get fabricated.

Surfaced to the agent inline in the ## Recent trades section (regime=trend_up_lowvol) and the reflection prompt's user-message trade summary, with the ### Regime fit slot explicitly inviting per-regime conditioning.
entry_lessons_hash — hashLessonsText(activeLessons) at the entry tick. 16-char SHA-256 prefix; whitespace-normalised so live + sim writers produce identical hashes for the same content. Pairs with reflection_notes.content_hash (populated at insert) to form a join key:
```
-- Was lessons revision A better than B?
select rn.content_hash,
       avg(th.realized_pnl_usd) as avg_pnl,
       count(*) as n
from reflection_notes rn
join trade_history th
  on th.entry_lessons_hash = rn.content_hash
 and th.deployment_id      = rn.deployment_id
where rn.skill_id = $1
  and th.status   = 'closed'
group by rn.content_hash;
```
Without this, "memory helps" is a faith claim. With it, every closed trade is implicitly labelled with the lessons revision it was made under — no extra instrumentation at decision time.

Rolling-window risk-state slot

A deterministic aggregate of this session's execution and risk-engine activity, rendered as ## Recent activity (this session) between ## Last decision and the closed-trade ledger. Distinct from lastDecision (which is one row) and from the closed-trade ledger (which is multi-day): this is the in-session pattern view.

## Recent activity (this session)
- engine rejections (last 1h): R3_POSITION_CAP=4, R6_RATE_LIMIT=1
- recent rejection sequence: R3_POSITION_CAP → R3_POSITION_CAP → R3_POSITION_CAP → R3_POSITION_CAP  ← same code in the last 3+ rejections; do not repeat the same proposal shape
- executed orders today (UTC): 2
- realized PnL today (UTC, vs day-start equity): -1.23%
- consecutive losing closes: 3 (consider whether your read of the regime still holds)

Counts and lines that are zero or absent are omitted entirely; if everything is zero (first tick of a new deployment), the whole section is suppressed. Bounded under all settings: ~80 tokens fully populated.

The repetition-flag heuristic is intentional: when the last three rejection codes are the same, we emit an explicit "do not repeat the same proposal shape" nudge. This targets the most common failure mode the slot exists to interrupt — the model proposing variations of the same oversized order over and over because each tick it only sees the most recent single rejection. Three is the smallest number that's clearly not noise; the formatter never accuses on two.

Data sources:

Live (apps/live-runner/src/memory-client.ts): five parallel reads — decision_snapshots for hourly rejections + today's executed count, trade_history for today's realized PnL + losing streak, agent_state for day_start_equity. Each section's query failure logs and returns its empty default; the layer never takes a tick down.
Sim (packages/simulator/src/memory.ts): an in-memory engineTail ring of the last 5,000 tick outcomes plus dayStartEquity captured on UTC midnight crossings. The backtest's tick loop calls ledger.recordTickOutcome({ engineKind, engineRule, equityUsd }) after each engine result so the next tick's slot reflects this one.

Context injection

assembleContext() (packages/agent-runtime/src/context.ts) gains two new sections when skill.context.memory.enabled === true:

## Recent trades on this skill (closed)
- 2026-06-04T10:00 → 12:30  BTC long  $250  65,200 → 65,940  +$28.40 (+1.1%)  120m  "breakout above prior swing high"
- 2026-06-04T08:15 → 09:00  ETH short $180  3,420  → 3,455   -$9.70  (-0.5%)  45m   "funding extreme, mean-revert"
...

## Open positions (memory view)
- ETH long $200 @ 3,410  mark=3,438  MFE=+$22 / MAE=-$8  held 30m  "BB lower-band bounce"

The format is deterministic: closed trades are sorted entry-DESC, capped at memory.recentTradesK. Each row is ~30 tokens. Open positions duplicate fields already in the ## Portfolio section but add memory-only context (MFE/MAE, holding time, original entry reason) — the portfolio section keeps its terse format for risk decisions; this section is for learning context.

Memory loading is a single batched read on trade_history per tick, joined to no other table — it costs a few milliseconds. The runner provides a MemoryClient on ToolContext (see API below); the simulator provides the same interface backed by an in-memory ledger that the post-tick materializer writes to.

Prompt segment

composeSystemPromptSegments() (packages/prompt-compose/src/compose.ts) gains a fifth segment:

export type SystemPromptSegments = {
  header: string;
  leash: string;
  strategy: string;
  lessons: string | null;   // NEW — present only when an active reflection note exists
  footer: string;
};

Rendered between strategy and footer:

Lessons from your recent trades on this skill (auto-generated; signal, not strategy):
- <bullet>
- <bullet>
...

The "signal, not strategy" framing is load-bearing. Hard avoid rules remain authoritative; lessons are reference. This mirrors ADR-0012's leash discipline — the trader's avoid text is the only ground truth for "never do X."

The lessons text is passed through composeSystemPrompt() at runtime by the live runner / sim worker via a new lessons?: string argument. The editor preview pane shows a placeholder lessons block by default so the "what the agent sees" panel still demonstrates the slot.

Reflection job

packages/agent-runtime/src/reflection.ts exports runReflection({ skill, trades, modelOverride? }) → Promise<ReflectionResult>. Pure logic, no DB writes. The caller persists the row.

Trigger (shouldRunReflection(state, skill)):

cadence === 'off' → never
cadence === 'per_n_trades' → run when closedTradesSinceLastReflection % everyNTrades === 0 and the latest closed trade just transitioned this tick
cadence === 'daily' → run when the current UTC date differs from the last active note's generated_at::date

Inside apps/live-runner/src/tick.ts, the trigger runs after persistTick has written any new trade_history rows for this tick. Reflection runs synchronously on the runner machine — Haiku at the default takes ~1s. We accept the latency hit on the rare ticks that trigger; the alternative (a separate worker) introduces ordering hazards with no real win.

Prompt — structured slots (reflection.ts:buildReflectionSystemPrompt):

The reflection output is forced into five named slots, each with at most 4 bullets:

Entry timing — when entries are working or failing vs. the strategy's setup criteria.
Sizing — whether typical $ size and leverage match the realized edge.
Symbol selection — which symbols are contributing PnL vs. dragging.
Exit timing — captured-MFE ratio; cutting too early vs. leaving PnL on the table.
Regime fit — whether the strategy's assumptions held in the period observed.

Why slots instead of freeform bullets:

Forces multi-axis attention. Without slots the model homes in on the one most visible failure and ignores the rest. With slots an empty bucket is explicit — (no clear pattern yet) — instead of accidentally silent.
Ablatable. A future pass can A/B suppress individual slots and measure which actually shifts behaviour. With freeform text you can't.
Slows accretion. The 4-bullet-per-slot cap is a structural ceiling on the "lessons grow forever" failure mode.

The prompt explicitly tells the model that empty slots are valid: "A slot with no clear pattern in the data: write exactly (no clear pattern yet) on its own line under the heading. Do NOT invent a lesson to fill the slot." Hard constraints (no contradicting avoid rules, no invented rules, every bullet grounded in numbers) are restated alongside the format.

The user message wraps a structured trade summary + the active strategy snippet + the existing active lessons (so reflection is incremental, not from-scratch each time).

Output validation:

Trim to ≤2000 chars.
Reject (and skip the run, logging a warning) if the output is empty, exceeds 4000 chars (model went off-rails), or contains literal text matching the trader's avoid rules verbatim (a paranoid linter; cheap to add, easy to disable if it ever bites).
Persist as a new reflection_notes row with status='active'; in the same transaction, flip the prior active row (if any) to status='superseded'.

ToolContext additions

// packages/tools/src/types.ts
export type MemoryClient = {
  /** Closed-trade rows newest-first, capped at limit. */
  recentClosed(args: { limit: number }): Promise<ClosedTradeMemoryRow[]>;
  /** Open-trade rows with MFE/MAE attached. */
  openWithMfeMae(): Promise<OpenTradeMemoryRow[]>;
  /** The active lessons text, or null. */
  activeLessons(): Promise<string | null>;
};

export type ToolContext = {
  // …existing fields…
  memory: MemoryClient;
};

Provided by:

Live runner: buildMemoryClient(deploymentId, supabase) → reads trade_history and reflection_notes via service-role.
Simulator: in-process InMemoryMemoryClient backed by the same materializer ops the post-tick step generates.
Chat agent: passes the same shape but builds its tool surface from it (gives the agent query_trade_history directly, since interactive latency is fine and there's no sim/live parity contract — see ADR-0006).

Sim / live parity

The five things that keep sim a faithful preview of live, given memory:

Materializer is shared code. Same function, same inputs, same ops.
Memory reads are deterministic. recentClosed({ limit: K }) returns the same rows in sim and live for the same trade sequence.
Reflection in sim is opt-in. Backtests have memory.reflection.cadence honored, but the default sim CLI sets it to 'off' to keep deterministic runs cheap. The trader can enable it explicitly to test "what does the agent learn over this 7-day backtest?" — and gets a non-deterministic run (Haiku output varies tick-to-tick).
No clock drift in MFE/MAE. Both runtimes pass the broker's mark price at the post-tick reconcile moment. The materializer does not call Date.now().
Lessons text is content-addressed. Each reflection_notes row carries (window_start, window_end, trades_considered) — replaying the same backtest produces the same set of generated rows in the same order.

Failure modes

Failure	Detection	Recovery
Materializer skips a tick (runner crash mid-write)	Next tick reconciles prev=broker(now-1) vs next=broker(now); the missed trade transition is detected on the next reconcile	Idempotent ops + upsert on `(deployment_id, symbol, status='open')`
Position appears/disappears outside the agent's proposal (oracle delisting, manual flatten command)	`decision_snapshots` for this tick shows `engine.kind != 'executed'` but position changed	Materializer labels exit_reason as `"external_flatten"` if the close was driven by a `flatten` command, `"liquidated"` otherwise. Logged at `info`.
Reflection job rate-limited or model errors	Caller catches the `runReflection` rejection	Log at `warn`; the prior active note stays active; retry on next trigger event
Lessons text exceeds 2000-char DB CHECK	Caller pre-truncates before insert	Hard cap at the application layer; never round-trip a too-long lessons through the DB
Stale prior active note (skill deleted, deployment stopped)	RLS still scopes reads to owner; cascade deletes drop notes when deployment goes away	No special handling needed

Token budget

At default settings (recentTradesK=10, reflection enabled with an active note):

Section	Approx tokens
System header	~600
Leash	~100
Strategy (thesis mode, ~3 paragraphs)	~600
Lessons (when active)	~500
Footer	~70
User: time + bars (100 × 5m)	~1,600
User: news + funding + OI	~450
User: portfolio + risk caps	~350
User: last decision	~100
User: recent activity (this session)	~80
User: recent trades (K=10)	~300
User: open positions memory view	~150
Total typical tick	~4,900
Total without memory	~3,870

Memory adds ~27% at defaults. The "what the agent sees" preview shows a live token estimate so the editor can warn at high settings.

Bounded under all skill configurations:

recentTradesK capped at 30 in the schema → worst-case +600 tok memory ledger.
lessons_text capped at 2000 chars → worst-case +500 tok lessons.

So even with knobs cranked: ~1,100 tok of memory per tick. The DB grows linearly with trade count; the prompt does not.

What's not in this layer

Cross-deployment memory. A deployment-scoped active note covers most cases. Skill-scoped notes (deployment_id = null) are allowed by the schema but only get written by an explicit trader-curated promotion — out of MVP scope.
Semantic recall (Phase 3). Deferred. Substrate (vector extension) is on; no embedding pipeline shipped here.
Agent-editable memory. The trading agent reads memory; it cannot write or correct it. The chat agent (ADR-0006) is read-only too. Any user-driven curation lives in the editor UI as a future pass.
Backfill from existing decision snapshots. When this layer ships, deployments running before the migration get an empty ledger; their next round-trip trades populate it forward. A backfill script over decision_snapshots + mainnet_orders is left for ops.

References

ADR-0017 — decision record this doc realizes
ADR-0006 — trading vs. chat split; chat owns the memory query tools
ADR-0012 — segmented system prompt this layer extends with lessons
ADR-0014 — broker-authoritative state contract the materializer reads from
packages/agent-runtime/src/context.ts — where the new sections render
packages/agent-runtime/src/materializer.ts — the pure materializer
packages/agent-runtime/src/reflection.ts — pure reflection runner
packages/prompt-compose/src/compose.ts — composed lessons segment

Memory layer

On this page