Three proposals addressing the 2026-02-25 trace pipeline OOM crash (114K CIDs): 1. trace-events-memory-triage: fetchmany iterator (read_sql_df_slow_iter), admission control (50K CID limit for non-MSD), cache skip for large queries, early memory release with gc.collect() 2. trace-async-job-queue: RQ-based async jobs for queries >20K CIDs, separate worker process with isolated memory, frontend polling via useTraceProgress composable, systemd service + deploy scripts 3. trace-streaming-response: chunked Redis storage (TRACE_STREAM_BATCH_SIZE=5000), NDJSON stream endpoint (GET /api/trace/job/<id>/stream), frontend ReadableStream consumer for progressive rendering, backward-compatible with legacy single-key storage All three proposals archived. 1101 tests pass, frontend builds clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
40 lines
1.7 KiB
Markdown
40 lines
1.7 KiB
Markdown
## Why
|
||
|
||
即使有非同步 job(提案 2)處理大查詢,結果 materialize 仍然是記憶體瓶頸:
|
||
|
||
1. **job result 全量 JSON**:114K CIDs × 2 domains 的結果 JSON 可達數百 MB,
|
||
Redis 儲存 + 讀取 + Flask jsonify 序列化,峰值記憶體仍高
|
||
2. **前端一次性解析**:瀏覽器解析數百 MB JSON 會 freeze UI
|
||
3. **Redis 單 key 限制**:大 value 影響 Redis 效能(阻塞其他操作)
|
||
|
||
串流回傳(NDJSON/分頁)讓 server 逐批產生資料、前端逐批消費,
|
||
記憶體使用與 CID 總數解耦,只與每批大小成正比。
|
||
|
||
## What Changes
|
||
|
||
- **EventFetcher 支援 iterator 模式**:`fetch_events_iter()` yield 每批結果而非累積全部
|
||
- **新增 `GET /api/trace/job/{job_id}/stream`**:NDJSON 串流回傳 job 結果
|
||
- **前端 useTraceProgress 串流消費**:用 `fetch()` + `ReadableStream` 逐行解析 NDJSON
|
||
- **結果分頁 API**:`GET /api/trace/job/{job_id}/result?domain=history&offset=0&limit=5000`
|
||
- **更新 .env.example**:`TRACE_STREAM_BATCH_SIZE`
|
||
|
||
## Capabilities
|
||
|
||
### New Capabilities
|
||
|
||
- `trace-streaming-response`: NDJSON 串流回傳 + 結果分頁
|
||
|
||
### Modified Capabilities
|
||
|
||
- `event-fetcher-unified`: 新增 iterator 模式(`fetch_events_iter`)
|
||
- `trace-staged-api`: job result 串流 endpoint
|
||
- `progressive-trace-ux`: 前端串流消費 + 逐批渲染
|
||
|
||
## Impact
|
||
|
||
- **後端核心**:event_fetcher.py(iterator 模式)、trace_routes.py(stream endpoint)
|
||
- **前端修改**:useTraceProgress.js(ReadableStream 消費)
|
||
- **部署設定**:.env.example(`TRACE_STREAM_BATCH_SIZE`)
|
||
- **不影響**:同步路徑(CID < 閾值仍走現有流程)、其他 service、即時監控頁
|
||
- **前置條件**:trace-async-job-queue(提案 2)
|