Market-event calendar

Companion ADR: ADR-0018 DB: public.market_events Producer: apps/news-poller (new extractor loop) → packages/data-ingest/src/events.ts Consumer: packages/agent-runtime/src/context.ts (## Upcoming events)

Responsibility

Give the trading agent a forward-looking view of scheduled market events (FOMC, CPI, NFP, crypto unlocks, halvings, etc.) so strategies that already reference "macro events" finally have a data layer behind them — without paying for a third-party calendar API.

This layer is pre-loaded into context, same shape as bars / news / portfolio (ADR-0017's mantra). The agent has no query_events tool. The chat agent gets read access to the table.

Pipeline

news_items ──┐                                                  ┌──► market_events
             │                                                  │
RSS / CryptoPanic                                                │
             │                                                  │
             ├──► poll loop in apps/news-poller ─┐               │
             │   (every 5 min)                    │              │
             │                                    │              │
             └──► regex pre-filter ──► cheap LLM extraction ────┘
                                       (Haiku default;
                                        free model opt-in)

Watermark-driven: the extractor remembers the latest news_items.ingested_at it has processed in platform_config (key: event_extractor_watermark). On boot the watermark loads; each loop processes only items newer than it; on success the watermark advances to the latest ingested_at seen.

On extractor crash mid-batch, the watermark stays at its prior value — the next boot reprocesses the items in flight. The market_events dedup index (unique (kind, hour(scheduled_at), symbols)) means re-extraction of the same news item produces no duplicates.

Data model

create table public.market_events (
  id              uuid primary key default gen_random_uuid(),
  kind            text not null check (kind in (
    'fomc', 'cpi', 'ppi', 'nfp', 'jobless_claims', 'gdp', 'pce',
    'ecb_rate', 'boe_rate', 'boj_rate',
    'crypto_unlock', 'exchange_listing', 'halving', 'mainnet_launch',
    'regulatory', 'earnings', 'other'
  )),
  scheduled_at    timestamptz not null,
  symbols         text[] not null default '{}',
  importance      numeric not null check (importance between 0 and 1),
  title           text not null,
  description     text,
  source          text not null,                 -- 'news_extraction' | 'manual' | 'vendor:...'
  source_news_id  uuid references public.news_items(id) on delete set null,
  confidence      numeric check (confidence between 0 and 1),
  cancelled_at    timestamptz,
  created_at      timestamptz not null default now(),
  updated_at      timestamptz not null default now()
);

-- Dedup: two news mentions of the same FOMC meeting coalesce into one row.
-- Hour-truncation tolerates small drift between sources on the same event.
create unique index market_events_kind_time_symbols_idx
  on public.market_events (
    kind,
    date_trunc('hour', scheduled_at),
    coalesce(array_to_string(symbols, ','), '')
  );

-- The agent's per-tick window query.
create index market_events_scheduled_idx
  on public.market_events (scheduled_at)
  where cancelled_at is null;

create index market_events_symbols_idx
  on public.market_events using gin (symbols);

RLS: select for any authenticated user (market data, no PII). Service-role writes only.

Extractor — regex pre-filter

Keywords that flag a news item as a candidate (case-insensitive, word-boundary):

fomc, fed (decision|meeting|speech), cpi, ppi, nfp, payroll, jobless,
gdp, pce, rate (decision|hike|cut), ecb, boe, boj,
(token )?unlock, halving, mainnet (launch|release), listing,
sec (action|filing), regulation, earnings, dividend,
\b\d{1,2} ?(am|pm|et|utc|gmt)\b, \bnext (week|month|wednesday|...)\b

If any pattern matches the headline OR the title contains a date-like substring, the item is passed to the LLM. Otherwise it's skipped — the watermark still advances.

Pre-filter recall vs. precision: we tune for high recall (cheap to false-positive into a regex hit; the LLM will return {events: []} on irrelevant items). False negatives are the expensive case — they silently miss events.

Extractor — LLM call

One call per pre-filter-passing news item. Structured output:

const ExtractedEvent = z.object({
  kind: z.enum([... same enum as DB ...]),
  scheduledAtIso: z.string(),     // ISO 8601 UTC; "next Wednesday" resolves against news.ts
  symbols: z.array(z.string()).default([]),
  importance: z.number().min(0).max(1),
  title: z.string().max(120),
  description: z.string().max(500).optional(),
  confidence: z.number().min(0).max(1),
});

const ExtractedEvents = z.object({
  events: z.array(ExtractedEvent),   // [] if no scheduled future event referenced
});

System prompt sketch:

"You read crypto-trading news and pull out scheduled future events the trading agent should know about. For each news item, decide: does it reference a specific upcoming event with a date? If yes, produce one structured event row. If no — even if the news is interesting — produce no event. Resolve relative dates like 'next Wednesday' to absolute ISO timestamps using the news publication time as the anchor. Output JSON only matching the schema."

User message:

News published at: <ts>
Title: <title>
Snippet: <first 500 chars of description>
Symbols already tagged: <symbols>

Output:

We use generateText + a JSON-extract helper (the same generateJson helper in apps/web/lib/ai-json.ts per ADR-0012's experience that OpenRouter's structured-output negotiation is unreliable for Anthropic-backed models).

Model selection

EVENT_EXTRACTOR_MODEL=anthropic/claude-haiku-4.5    # default
EVENT_EXTRACTOR_FALLBACK_MODEL=                     # optional cheaper retry

Free model option:

EVENT_EXTRACTOR_MODEL=meta-llama/llama-3.3-70b-instruct:free
EVENT_EXTRACTOR_FALLBACK_MODEL=anthropic/claude-haiku-4.5

The fallback is consulted only on model errors or rate-limit responses, never on {events: []} outputs (those are valid).

Validation gates before insert

Schema match (zod). Reject row → skip; log.
Confidence ≥ 0.5. Below threshold → skip; log at info.
scheduledAt is in the future relative to the news publish time + 5 min slack. Past-event references aren't calendar entries.
scheduledAt is within +90 days. Anything further out is more likely a model hallucination than a real schedule.
Dedup pre-check. Query for existing row at same (kind, hour(scheduled_at), symbols) — if exists, no insert. The DB unique index is the floor; the application check is faster on duplicates.

Failures at any gate are logged with the news_item id so we can audit precision.

Context injection

assembleContext() adds a new section between ## News and ## Funding rates, only when skill.context.events.enabled === true:

## Upcoming events (next 24h)
- [2026-06-11T18:00Z, in 2.5h] fomc (importance 0.95) — "FOMC rate decision"
- [2026-06-12T12:30Z, in 20.5h] cpi (importance 0.85) — "May CPI release"
- [2026-06-13T13:00Z, in 45h] crypto_unlock (importance 0.40) — "ARB 1.1B token unlock" — ARB

Bounded by:

events.lookaheadHours (default 24, max 72).
events.minImportance (default 0.4) — filters noise.
events.cryptoOnly (default false) — filters to events whose symbols is non-empty.
Hard cap: 20 events even if all of the above let more through.

Token cost: ~25 tokens/row × ≤20 rows ≈ ≤500 tok/tick worst case. Typical ≤150.

Sim / live parity

Same asOf discipline as the news client. The events client query is:

where scheduled_at >  $asOf
  and scheduled_at <= $asOf + lookahead
  and cancelled_at is null
  and importance  >= $minImportance

The asOf filter prevents a backtest from seeing events that hadn't been published yet at the simulated time. Production fills $asOf = now(). Sim fills $asOf = tick.ts.

Note: this means sims over historical periods will surface fewer events than live, because the extractor only built the table forward from when it shipped. A historical backfill is documented in the ADR as deferred.

Failure modes

Failure	Detection	Recovery
Extractor LLM call errors	Caller catches; falls through to fallback model if configured, else logs and continues	Watermark does NOT advance on the failing item — retried next loop
Output fails zod parse	Schema validator returns null	Log at `warn` with news_item id; skip row; watermark advances (we don't want a malformed model on one item to wedge the pipeline)
Dedup index conflict on insert	Postgres returns 23505	Treated as success — the event already exists
Watermark cell missing in platform_config	`loadWatermark` returns epoch 0	Next loop processes everything; LLM dedup gate prevents duplicate work, but the first run does a one-time burst
News volume spikes 10×	Cost still bounded by regex hit-rate; budget envelope rises proportionally	Add a per-loop max-LLM-calls cap (TBD if seen in practice)
`EVENT_EXTRACTOR_MODEL` env unset	Default applies (Haiku)	None needed
`OPENROUTER_API_KEY` unset	Same env as the trading agent and reflection; extractor logs `extractor_disabled` and skips the loop	Operator notices via logs

Operations

Disable globally: set EVENT_EXTRACTOR_ENABLED=false on the news-poller Fly app. The pipeline keeps polling news; just the event-extraction loop is skipped.
One-shot backfill: a CLI in packages/data-ingest/src/cli.ts (pnpm ingest events-backfill --since 2026-05-01) re-runs the extractor over historical news_items. Deferred until needed.
Audit: every insert logs event_extracted with news_item_id, kind, confidence, scheduled_at. Rejections log event_rejected with the reason. Grep-able from agent_logs (the news-poller writes to its own log channel — TBD on which exactly; same as today's RSS upserts).

References

ADR-0018
ADR-0011 — news ingestion this consumes
apps/news-poller/src/index.ts — host
packages/agent-runtime/src/context.ts — consumer
apps/web/lib/ai-json.ts — JSON-extract helper pattern

Market-event calendar

On this page