Skip to main content

Overview

Two cron-scheduled tasks drive the entire Signals pipeline. Users configure signal types and enable sources via the HTTP API; the crons do the actual scanning and housekeeping.

signal-scan-daily

ScheduleEvery 15 minutes
Cron*/15 * * * *
Filetrigger/signal-scan.ts
The main signals pipeline. Every 15 minutes, the scheduler wakes up and picks users whose preferred scan window matches the current slot, then fans out per-signal-type-per-source scans. Each scan fetches fresh content via the source adapter, AI-classifies matches, scores them, and upserts into signal_results — which is what shows up in List signals in the feed.
What an agent should know. The feed is near-real-time but not instant — a newly-added keyword can take up to the user’s configured scan window (typically 1–24 h depending on the source tier) before its first matches appear. Users can tune their scan window via Update the scan-time preference.Fan-out shape. Each run of signal-scan-daily triggers many signal-type-fetch child tasks — one per (user, signal_type, source) combination that’s due. A user with 20 signal types and 10 sources would produce up to 200 child tasks per slot they’re due for.Partner webhooks. When new signals are validated during a scan, Svix webhook events are fired to any partner that’s subscribed to signal.created. Delivery is best-effort with retries.

signal-entity-cleanup

ScheduleDaily at 07:30 UTC
Cron30 7 * * *
Filetrigger/signal-entity-cleanup.ts
Nightly housekeeping for the signal_entities table. Runs after the morning signal-scan slots so the entity graph reflects the latest classifications. Four steps:
  1. Delete stale signals — removes signal_results rows older than 1 year
  2. Remove orphans — deletes entities with signal_count = 0 or no linked signals
  3. Recalculate aggregate scores — fixes drift between cached score and actual linked-signal average
  4. Merge duplicates — merges entities that the normalizer now recognizes as the same (e.g., Acme and Acme Inc. after a normalizer update)
What an agent should know. Entity merges are destructive — after this cron runs, a cached entity_id the agent was holding might disappear because that entity was merged into another. Always look up the canonical entity ID from the current feed rather than caching long-term. Aggregate scores can shift after this runs; cached scores go stale.