Files
DashBoard/openspec/changes/archive/trace-streaming-response/proposal.md
egg dbe0da057c feat(trace-pipeline): memory triage, async job queue, and NDJSON streaming
Three proposals addressing the 2026-02-25 trace pipeline OOM crash (114K CIDs):

1. trace-events-memory-triage: fetchmany iterator (read_sql_df_slow_iter),
   admission control (50K CID limit for non-MSD), cache skip for large queries,
   early memory release with gc.collect()

2. trace-async-job-queue: RQ-based async jobs for queries >20K CIDs,
   separate worker process with isolated memory, frontend polling via
   useTraceProgress composable, systemd service + deploy scripts

3. trace-streaming-response: chunked Redis storage (TRACE_STREAM_BATCH_SIZE=5000),
   NDJSON stream endpoint (GET /api/trace/job/<id>/stream), frontend
   ReadableStream consumer for progressive rendering, backward-compatible
   with legacy single-key storage

All three proposals archived. 1101 tests pass, frontend builds clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 21:01:27 +08:00

1.7 KiB
Raw Blame History

Why

即使有非同步 job提案 2處理大查詢結果 materialize 仍然是記憶體瓶頸:

  1. job result 全量 JSON114K CIDs × 2 domains 的結果 JSON 可達數百 MB Redis 儲存 + 讀取 + Flask jsonify 序列化,峰值記憶體仍高
  2. 前端一次性解析:瀏覽器解析數百 MB JSON 會 freeze UI
  3. Redis 單 key 限制:大 value 影響 Redis 效能(阻塞其他操作)

串流回傳NDJSON/分頁)讓 server 逐批產生資料、前端逐批消費, 記憶體使用與 CID 總數解耦,只與每批大小成正比。

What Changes

  • EventFetcher 支援 iterator 模式fetch_events_iter() yield 每批結果而非累積全部
  • 新增 GET /api/trace/job/{job_id}/streamNDJSON 串流回傳 job 結果
  • 前端 useTraceProgress 串流消費:用 fetch() + ReadableStream 逐行解析 NDJSON
  • 結果分頁 APIGET /api/trace/job/{job_id}/result?domain=history&offset=0&limit=5000
  • 更新 .env.exampleTRACE_STREAM_BATCH_SIZE

Capabilities

New Capabilities

  • trace-streaming-response: NDJSON 串流回傳 + 結果分頁

Modified Capabilities

  • event-fetcher-unified: 新增 iterator 模式(fetch_events_iter
  • trace-staged-api: job result 串流 endpoint
  • progressive-trace-ux: 前端串流消費 + 逐批渲染

Impact

  • 後端核心event_fetcher.pyiterator 模式、trace_routes.pystream endpoint
  • 前端修改useTraceProgress.jsReadableStream 消費)
  • 部署設定.env.exampleTRACE_STREAM_BATCH_SIZE
  • 不影響同步路徑CID < 閾值仍走現有流程)、其他 service、即時監控頁
  • 前置條件trace-async-job-queue提案 2