Files
DashBoard/openspec/changes/archive/trace-events-memory-triage/specs/event-fetcher-unified/spec.md
egg dbe0da057c feat(trace-pipeline): memory triage, async job queue, and NDJSON streaming
Three proposals addressing the 2026-02-25 trace pipeline OOM crash (114K CIDs):

1. trace-events-memory-triage: fetchmany iterator (read_sql_df_slow_iter),
   admission control (50K CID limit for non-MSD), cache skip for large queries,
   early memory release with gc.collect()

2. trace-async-job-queue: RQ-based async jobs for queries >20K CIDs,
   separate worker process with isolated memory, frontend polling via
   useTraceProgress composable, systemd service + deploy scripts

3. trace-streaming-response: chunked Redis storage (TRACE_STREAM_BATCH_SIZE=5000),
   NDJSON stream endpoint (GET /api/trace/job/<id>/stream), frontend
   ReadableStream consumer for progressive rendering, backward-compatible
   with legacy single-key storage

All three proposals archived. 1101 tests pass, frontend builds clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 21:01:27 +08:00

1.0 KiB

MODIFIED Requirements

Requirement: EventFetcher SHALL use streaming fetch for batch queries

EventFetcher._fetch_batch SHALL use read_sql_df_slow_iter (fetchmany-based iterator) instead of read_sql_df (fetchall + DataFrame) to reduce peak memory usage.

Scenario: Batch query memory optimization

  • WHEN EventFetcher executes a batch query for a domain
  • THEN the query SHALL use cursor.fetchmany(batch_size) (env: DB_SLOW_FETCHMANY_SIZE, default: 5000) instead of cursor.fetchall()
  • THEN rows SHALL be converted directly to dicts via dict(zip(columns, row)) without building a DataFrame
  • THEN each fetchmany batch SHALL be grouped into the result dict immediately, allowing the batch rows to be garbage collected

Scenario: Existing API contract preserved

  • WHEN EventFetcher.fetch_events() returns results
  • THEN the return type SHALL remain Dict[str, List[Dict[str, Any]]] (grouped by CONTAINERID)
  • THEN the result SHALL be identical to the previous DataFrame-based implementation