Market-event calendar
Companion ADR: ADR-0018 DB:
public.market_eventsProducer:apps/news-poller(new extractor loop) →packages/data-ingest/src/events.tsConsumer:packages/agent-runtime/src/context.ts(## Upcoming events)
Responsibility
Give the trading agent a forward-looking view of scheduled market events (FOMC, CPI, NFP, crypto unlocks, halvings, etc.) so strategies that already reference "macro events" finally have a data layer behind them — without paying for a third-party calendar API.
This layer is pre-loaded into context, same shape as bars / news / portfolio (ADR-0017's mantra). The agent has no query_events tool. The chat agent gets read access to the table.
Pipeline
news_items ──┐ ┌──► market_events
│ │
RSS / CryptoPanic │
│ │
├──► poll loop in apps/news-poller ─┐ │
│ (every 5 min) │ │
│ │ │
└──► regex pre-filter ──► cheap LLM extraction ────┘
(Haiku default;
free model opt-in)Watermark-driven: the extractor remembers the latest news_items.ingested_at it has processed in platform_config (key: event_extractor_watermark). On boot the watermark loads; each loop processes only items newer than it; on success the watermark advances to the latest ingested_at seen.
On extractor crash mid-batch, the watermark stays at its prior value — the next boot reprocesses the items in flight. The market_events dedup index (unique (kind, hour(scheduled_at), symbols)) means re-extraction of the same news item produces no duplicates.
Data model
create table public.market_events (
id uuid primary key default gen_random_uuid(),
kind text not null check (kind in (
'fomc', 'cpi', 'ppi', 'nfp', 'jobless_claims', 'gdp', 'pce',
'ecb_rate', 'boe_rate', 'boj_rate',
'crypto_unlock', 'exchange_listing', 'halving', 'mainnet_launch',
'regulatory', 'earnings', 'other'
)),
scheduled_at timestamptz not null,
symbols text[] not null default '{}',
importance numeric not null check (importance between 0 and 1),
title text not null,
description text,
source text not null, -- 'news_extraction' | 'manual' | 'vendor:...'
source_news_id uuid references public.news_items(id) on delete set null,
confidence numeric check (confidence between 0 and 1),
cancelled_at timestamptz,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now()
);
-- Dedup: two news mentions of the same FOMC meeting coalesce into one row.
-- Hour-truncation tolerates small drift between sources on the same event.
create unique index market_events_kind_time_symbols_idx
on public.market_events (
kind,
date_trunc('hour', scheduled_at),
coalesce(array_to_string(symbols, ','), '')
);
-- The agent's per-tick window query.
create index market_events_scheduled_idx
on public.market_events (scheduled_at)
where cancelled_at is null;
create index market_events_symbols_idx
on public.market_events using gin (symbols);RLS: select for any authenticated user (market data, no PII). Service-role writes only.
Extractor — regex pre-filter
Keywords that flag a news item as a candidate (case-insensitive, word-boundary):
fomc, fed (decision|meeting|speech), cpi, ppi, nfp, payroll, jobless,
gdp, pce, rate (decision|hike|cut), ecb, boe, boj,
(token )?unlock, halving, mainnet (launch|release), listing,
sec (action|filing), regulation, earnings, dividend,
\b\d{1,2} ?(am|pm|et|utc|gmt)\b, \bnext (week|month|wednesday|...)\bIf any pattern matches the headline OR the title contains a date-like substring, the item is passed to the LLM. Otherwise it's skipped — the watermark still advances.
Pre-filter recall vs. precision: we tune for high recall (cheap to false-positive into a regex hit; the LLM will return {events: []} on irrelevant items). False negatives are the expensive case — they silently miss events.
Extractor — LLM call
One call per pre-filter-passing news item. Structured output:
const ExtractedEvent = z.object({
kind: z.enum([... same enum as DB ...]),
scheduledAtIso: z.string(), // ISO 8601 UTC; "next Wednesday" resolves against news.ts
symbols: z.array(z.string()).default([]),
importance: z.number().min(0).max(1),
title: z.string().max(120),
description: z.string().max(500).optional(),
confidence: z.number().min(0).max(1),
});
const ExtractedEvents = z.object({
events: z.array(ExtractedEvent), // [] if no scheduled future event referenced
});System prompt sketch:
"You read crypto-trading news and pull out scheduled future events the trading agent should know about. For each news item, decide: does it reference a specific upcoming event with a date? If yes, produce one structured event row. If no — even if the news is interesting — produce no event. Resolve relative dates like 'next Wednesday' to absolute ISO timestamps using the news publication time as the anchor. Output JSON only matching the schema."
User message:
News published at: <ts>
Title: <title>
Snippet: <first 500 chars of description>
Symbols already tagged: <symbols>
Output:We use generateText + a JSON-extract helper (the same generateJson helper in apps/web/lib/ai-json.ts per ADR-0012's experience that OpenRouter's structured-output negotiation is unreliable for Anthropic-backed models).
Model selection
EVENT_EXTRACTOR_MODEL=anthropic/claude-haiku-4.5 # default
EVENT_EXTRACTOR_FALLBACK_MODEL= # optional cheaper retryFree model option:
EVENT_EXTRACTOR_MODEL=meta-llama/llama-3.3-70b-instruct:free
EVENT_EXTRACTOR_FALLBACK_MODEL=anthropic/claude-haiku-4.5The fallback is consulted only on model errors or rate-limit responses, never on {events: []} outputs (those are valid).
Validation gates before insert
- Schema match (zod). Reject row → skip; log.
- Confidence ≥ 0.5. Below threshold → skip; log at
info. scheduledAtis in the future relative to the news publish time + 5 min slack. Past-event references aren't calendar entries.scheduledAtis within +90 days. Anything further out is more likely a model hallucination than a real schedule.- Dedup pre-check. Query for existing row at same
(kind, hour(scheduled_at), symbols)— if exists, no insert. The DB unique index is the floor; the application check is faster on duplicates.
Failures at any gate are logged with the news_item id so we can audit precision.
Context injection
assembleContext() adds a new section between ## News and ## Funding rates, only when skill.context.events.enabled === true:
## Upcoming events (next 24h)
- [2026-06-11T18:00Z, in 2.5h] fomc (importance 0.95) — "FOMC rate decision"
- [2026-06-12T12:30Z, in 20.5h] cpi (importance 0.85) — "May CPI release"
- [2026-06-13T13:00Z, in 45h] crypto_unlock (importance 0.40) — "ARB 1.1B token unlock" — ARBBounded by:
events.lookaheadHours(default 24, max 72).events.minImportance(default 0.4) — filters noise.events.cryptoOnly(default false) — filters to events whosesymbolsis non-empty.- Hard cap: 20 events even if all of the above let more through.
Token cost: ~25 tokens/row × ≤20 rows ≈ ≤500 tok/tick worst case. Typical ≤150.
Sim / live parity
Same asOf discipline as the news client. The events client query is:
where scheduled_at > $asOf
and scheduled_at <= $asOf + lookahead
and cancelled_at is null
and importance >= $minImportanceThe asOf filter prevents a backtest from seeing events that hadn't been published yet at the simulated time. Production fills $asOf = now(). Sim fills $asOf = tick.ts.
Note: this means sims over historical periods will surface fewer events than live, because the extractor only built the table forward from when it shipped. A historical backfill is documented in the ADR as deferred.
Failure modes
| Failure | Detection | Recovery |
|---|---|---|
| Extractor LLM call errors | Caller catches; falls through to fallback model if configured, else logs and continues | Watermark does NOT advance on the failing item — retried next loop |
| Output fails zod parse | Schema validator returns null | Log at warn with news_item id; skip row; watermark advances (we don't want a malformed model on one item to wedge the pipeline) |
| Dedup index conflict on insert | Postgres returns 23505 | Treated as success — the event already exists |
| Watermark cell missing in platform_config | loadWatermark returns epoch 0 | Next loop processes everything; LLM dedup gate prevents duplicate work, but the first run does a one-time burst |
| News volume spikes 10× | Cost still bounded by regex hit-rate; budget envelope rises proportionally | Add a per-loop max-LLM-calls cap (TBD if seen in practice) |
EVENT_EXTRACTOR_MODEL env unset | Default applies (Haiku) | None needed |
OPENROUTER_API_KEY unset | Same env as the trading agent and reflection; extractor logs extractor_disabled and skips the loop | Operator notices via logs |
Operations
- Disable globally: set
EVENT_EXTRACTOR_ENABLED=falseon the news-poller Fly app. The pipeline keeps polling news; just the event-extraction loop is skipped. - One-shot backfill: a CLI in
packages/data-ingest/src/cli.ts(pnpm ingest events-backfill --since 2026-05-01) re-runs the extractor over historicalnews_items. Deferred until needed. - Audit: every insert logs
event_extractedwithnews_item_id,kind,confidence,scheduled_at. Rejections logevent_rejectedwith the reason. Grep-able fromagent_logs(the news-poller writes to its own log channel — TBD on which exactly; same as today's RSS upserts).
References
- ADR-0018
- ADR-0011 — news ingestion this consumes
apps/news-poller/src/index.ts— hostpackages/agent-runtime/src/context.ts— consumerapps/web/lib/ai-json.ts— JSON-extract helper pattern