ADR-0018: Market-event calendar via news-driven extraction
- Status: proposed
- Date: 2026-06-05
- Builds on: ADR-0006, ADR-0011, ADR-0017
- Affects (planned):
packages/skill-schema/src/events.ts(new),packages/skill-schema/src/context.ts,packages/agent-runtime/src/context.ts,packages/agent-runtime/src/event-extractor.ts(new),apps/news-poller/src/extractor.ts(new),packages/data-ingest/src/events.ts(new),packages/db/supabase/migrations/2026XXXXXXXXXX_market_events.sql
Context
ADR-0011 wired CryptoPanic + RSS feeds into public.news_items. The trading agent sees recent headlines via the ## News context section. But:
- News is reactive, not predictive. It tells the agent that CPI just printed, not that CPI is scheduled in 90 minutes.
- Strategy text references macro events the data layer doesn't surface. The editor's strategy linter recognizes
fomc,cpi,nfp, andmacro; the linter foravoidtext routinely suggests "Stand down inside 30 minutes around scheduled macro prints." But there is nomarket_eventstable, no calendar feed, and no per-tick mechanism for the agent to know how close it is to one. - Free macro-calendar APIs exist (TradingEconomics, Finnhub, Forex Factory) but they're rate-limited and require per-vendor signup. For crypto-specific events (token unlocks, exchange listings, halvings) calendar coverage is even thinner and skewed paid.
What we already have that we can build on:
- A continuous news ingest pipeline (
apps/news-poller) that runs as a Fly machine and writes deduped rows tonews_items. - Sentiment + symbol-tagging already happens at ingest time for CryptoPanic; RSS items get sentiment via the same path.
- The trading-agent context assembly (
packages/agent-runtime/src/context.ts) is the canonical place to inject anything time-bounded.
What's missing is the extraction step that turns headlines like "Markets brace for next week's FOMC decision on Wednesday" into a structured market_events row with scheduled_at = '2026-06-11T18:00:00Z', kind = 'fomc', importance = 0.9.
Decision
Add a market-event calendar layer driven primarily by news-derived LLM extraction, with deterministic seed events for the obvious recurring schedule we already know. Concretely:
1. market_events table — structured, dedup'd, queryable by time window
create table public.market_events (
id uuid primary key default gen_random_uuid(),
kind text not null check (kind in (
'fomc', 'cpi', 'ppi', 'nfp', 'jobless_claims', 'gdp', 'pce',
'ecb_rate', 'boe_rate', 'boj_rate',
'crypto_unlock', 'exchange_listing', 'halving', 'mainnet_launch',
'regulatory', 'earnings', 'other'
)),
scheduled_at timestamptz not null,
symbols text[] not null default '{}', -- ['BTC','ETH'] for crypto-specific; [] = global
importance numeric not null check (importance between 0 and 1),
title text not null, -- short human-readable; e.g. "FOMC rate decision"
description text,
source text not null, -- 'news_extraction' | 'manual' | 'rss:source-id' | …
source_news_id uuid references public.news_items(id) on delete set null,
confidence numeric check (confidence between 0 and 1),
cancelled_at timestamptz, -- nullable; lets us soft-mark events that got cancelled / moved
created_at timestamptz not null default now(),
updated_at timestamptz not null default now()
);Plus a dedup index: unique (kind, date_trunc('hour', scheduled_at), coalesce(array_to_string(symbols, ','), '')) so multiple news mentions of the same FOMC meeting coalesce into one row.
RLS: same as news_items — select for any authed user, service-role writes.
Indices: (scheduled_at desc) where cancelled_at is null for the agent's per-tick query; GIN on symbols.
2. News-driven extractor — extension of apps/news-poller, not a new app
Reuses the same Fly machine and process. A new poll loop:
-
Runs on a fixed interval (default 5 minutes — bounded by the slowest news source's cadence, much faster than the data is reacting).
-
Pulls
news_itemssince the last successful extraction watermark. -
Pre-filters with a cheap regex on title/symbols: only items that look like they might reference a scheduled event (keywords:
fomc,cpi,rate decision,earnings,unlock,halving,launch,decision on,meeting,report on, ISO-style dates, etc.). This cuts LLM volume by ~10× without losing recall on real events. -
For the items that pass the regex pre-filter, calls a cheap model (default:
anthropic/claude-haiku-4.5via OpenRouter — at ~$0.25/$1.25 per M tokens, full daily volume costs<$0.05) with a strict structured-output prompt:"For each headline, decide whether it references a future scheduled event relevant to crypto trading. If yes, output the structured event; if no, output
{events: []}. Date references like 'next Wednesday' must be resolved to an ISO timestamp using the publish date as the anchor." -
Validates outputs against a zod schema; rejects malformed or low-confidence (< 0.5) items.
-
Writes to
market_eventswithsource = 'news_extraction',source_news_id = <news_item.id>(the originating row),confidence = <model score>(0–1).
3. Cost discipline
The user's brief specifically calls out cost. Three layers of defense:
- Regex pre-filter: most ingested news is price commentary, not event scheduling. Cuts LLM calls 10×.
- Configurable cheap model: default Haiku; one env (
EVENT_EXTRACTOR_MODEL) flips to a free OpenRouter model (e.g.meta-llama/llama-3.3-70b-instruct:free) for users willing to trade quality for $0. The orchestrator falls back to Haiku on free-tier rate limit hits. - Deduplication before LLM call: if a news headline lands within an hour of an already-extracted event of the same kind, skip the LLM call entirely — we already know about that meeting.
Budget envelope: at ~200 news items/day × 10% regex hit rate × ~400 input tokens + 150 output tokens per call ≈ 8K input + 3K output per day. At Haiku rates: ~$0.006/day per platform-wide poller. Negligible.
4. Context surfacing — ## Upcoming events
A new section in assembleContext between ## News and ## Funding:
## Upcoming events (next 24h)
- [2026-06-11T18:00Z] fomc (importance 0.95) — "FOMC rate decision"
- [2026-06-12T12:30Z] cpi (importance 0.85) — "May CPI release"
- [2026-06-12T13:00Z] crypto_unlock (importance 0.40) — "Arbitrum 1.1B ARB token unlock" — ARBWindow is skill.context.eventsLookaheadHours (default 24, max 72). Items beyond the window are not injected. Token cost: ~25 tokens/event × ~5 events/day window ≈ +125 tokens/tick. Bounded.
5. Skill schema additions
export const SkillEventsConfig = z.object({
enabled: z.boolean().default(true),
lookaheadHours: z.number().int().min(0).max(72).default(24),
/** Filter the surfaced events by importance score. 0 = all events. */
minImportance: z.number().min(0).max(1).default(0.4),
/**
* When true, only crypto-specific events (those with non-empty `symbols`)
* are surfaced — a discretionary skill that doesn't care about US macro
* can opt out without disabling the layer entirely.
*/
cryptoOnly: z.boolean().default(false),
});Nested under SkillContextConfig.events.
Alternatives considered
Alt A — Third-party paid macro calendar (TradingEconomics / Finnhub / Forex Factory)
Cleanest data shape: vendor publishes scheduled events directly; no LLM needed.
Rejected because:
- Each vendor has its own signup, billing, and rate limits.
- Free tiers are restrictive (single country, sparse coverage of crypto).
- Adds operational dependency the existing news pipeline doesn't have.
- News-driven extraction will already cover the events crypto traders care about (mainstream macro DOES get discussed in crypto news; crypto-specific events like unlocks are best-covered by news anyway).
- Defer until we see whether news-derived extraction has acceptable recall. Easy to layer on later — same
market_eventstable, differentsourcecolumn.
Alt B — Store events as a new column on news_items
Simpler schema; reuses RLS. Rejected because:
- One event ↔ many news items. Dedup becomes hand-rolled across rows.
- The agent's access pattern (
scheduled_atwindow query) is very different from news (tswindow). A separate table makes both indices natural. - Operationally messy when the extractor needs to reprocess: do we mutate the news row's event column or write a new news row?
Alt C — Pure LLM with no regex pre-filter
Simpler code; risk of higher cost.
Rejected because the regex pre-filter cuts LLM volume by an order of magnitude at near-zero cost, and the keywords for in-scope events (FOMC / CPI / rate / unlock / halving / launch) are well-known. The pre-filter is opt-out (env knob) for paranoid completeness.
Alt D — Standalone extractor process (new Fly app)
Cleaner separation; rejected because the workload is light (one Haiku call per ~10 news items, bounded watermark-driven) and the news-poller already runs continuously on a shared-cpu-1x machine with ample headroom. Adding a second machine doubles the ops surface for no real benefit. If the extractor's footprint ever materially competes with news polling we'll split.
Alt E — Run extraction inline in the trading-agent loop
Could ask the agent to call a tool that synthesises events from news on demand. Rejected for the same reasons as memory-as-tool in ADR-0017: step budget pressure, non-determinism, sim/live drift. Calendar belongs in the pre-loaded context.
Consequences
Positive
- Closes a real gap. Strategies that already reference "macro events" finally get the data layer they assume exists. The editor's
avoidboilerplate stops being aspirational. - Cheap by design. Regex pre-filter + Haiku default + dedup keeps daily cost in the cents; a free OpenRouter model knocks it to zero for users who tolerate the quality drop.
- Composable with memory layer. The
entry_regime_tagalready records the regime a trade opened under; a future analytics query can jointrade_historyagainstmarket_eventsto answer "do I underperform in the hour before CPI?" without new instrumentation. - Operationally simple. One Fly machine, one DB table, no new vendor accounts.
- Sim/live parity. Sims read the same
market_eventstable over theirasOfwindow — backtests honour events that were known at the simulated time. Same leakage guard as the news client (scheduled_atfilter vsasOf).
Negative / trade-offs
- Hallucination risk on dates. The extractor depends on the model resolving "next Wednesday" correctly against the news publish date. Mitigations: (a) low-confidence rows are dropped before insert, (b) the
(kind, hour, symbols)dedup index swallows duplicates, (c) we surfacesource = 'news_extraction'so consumers know the date came from a model not a primary source. Manual / vendor sources can be layered in later withsource = 'manual' | 'vendor:...'and take precedence on dedup. - Recall limited by news coverage. Events not discussed in our news sources won't be in the calendar. Acceptable for MVP (the most market-moving events are well-covered). A vendor calendar layer (Alt A) is the obvious upgrade if recall becomes a complaint.
- Confidence calibration is a model property. Different models report confidence on different scales. The 0.5 threshold is empirical; we accept some drift across model swaps. Logged at
infoso we can see which items dropped. - Extractor cost compounds in unusual news bursts. A market-event flash crash generates dozens of headlines about the same event; the dedup-by-hour gate plus the regex pre-filter keeps cost bounded, but a hostile-input scenario (deliberate noise) would still be limited only by the news ingest itself. Acceptable; news ingest is already rate-bounded upstream.
- No prediction of un-news'd events. If a Fed governor speech happens that the press didn't pre-cover, the calendar will only learn about it after the news lands. Limitation; mitigated by the fact that headline-driving events almost always get pre-covered.
Things we'll need to revisit
- Vendor calendar layer (Alt A) when recall measurement shows a meaningful gap, or when traders start asking for non-US macro events.
- Cancellation / reschedule tracking. Today's design soft-marks
cancelled_atbut the extractor doesn't yet listen for "FOMC postponed" headlines. Worth adding when we see it happen. - Per-skill event subscription. A skill trading only ETH probably doesn't care about a Solana token unlock. The
cryptoOnlyknob is a coarse first cut; finer subscription rules are deferred. - Backfill from historical news_items. The extractor only runs on new items from its watermark forward. A one-shot backfill over recent
news_itemswould seed the table; defer until we see the empty-calendar first-week pain.
References
- ADR-0011 — news ingestion the extractor consumes
- ADR-0017 — memory layer this composes with for "PnL near scheduled events" analytics
apps/news-poller/src/index.ts— host process the extractor extendspackages/agent-runtime/src/context.ts— where## Upcoming eventsslots indocs/architecture/market-events.md— implementation detail companion