Three proposals addressing the 2026-02-25 trace pipeline OOM crash (114K CIDs): 1. trace-events-memory-triage: fetchmany iterator (read_sql_df_slow_iter), admission control (50K CID limit for non-MSD), cache skip for large queries, early memory release with gc.collect() 2. trace-async-job-queue: RQ-based async jobs for queries >20K CIDs, separate worker process with isolated memory, frontend polling via useTraceProgress composable, systemd service + deploy scripts 3. trace-streaming-response: chunked Redis storage (TRACE_STREAM_BATCH_SIZE=5000), NDJSON stream endpoint (GET /api/trace/job/<id>/stream), frontend ReadableStream consumer for progressive rendering, backward-compatible with legacy single-key storage All three proposals archived. 1101 tests pass, frontend builds clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
141 lines
4.8 KiB
Markdown
141 lines
4.8 KiB
Markdown
## Context
|
||
|
||
提案 2(trace-async-job-queue)讓大查詢在獨立 worker 中執行,
|
||
但結果仍然全量 materialize 到 Redis(job result)和前端記憶體。
|
||
|
||
114K CIDs × 2 domains 的結果 JSON 可達 200-500MB:
|
||
- Worker 記憶體:grouped dict ~500MB + JSON serialize ~500MB = ~1GB 峰值
|
||
- Redis:SETEX 500MB 的 key 耗時 5-10s,阻塞其他操作
|
||
- 前端:瀏覽器解析 500MB JSON freeze UI 數十秒
|
||
|
||
串流回傳讓 server 逐批產生、前端逐批消費,記憶體使用只與每批大小成正比。
|
||
|
||
## Goals / Non-Goals
|
||
|
||
**Goals:**
|
||
- Job 結果以 NDJSON 串流回傳,避免全量 materialize
|
||
- EventFetcher 支援 iterator 模式,逐批 yield 結果
|
||
- 前端用 ReadableStream 逐行解析,逐批渲染
|
||
- 結果也支援分頁 API(給不支援串流的 consumer 使用)
|
||
|
||
**Non-Goals:**
|
||
- 不改動同步路徑(CID < 閾值仍走現有 JSON 回傳)
|
||
- 不做 WebSocket(NDJSON over HTTP 更簡單、更通用)
|
||
- 不做 Server-Sent Events(SSE 只支援 text/event-stream,不適合大 payload)
|
||
- 不修改 MSD aggregation(aggregation 需要全量資料,但結果較小)
|
||
|
||
## Decisions
|
||
|
||
### D1: NDJSON 格式
|
||
|
||
**決策**:使用 Newline Delimited JSON(NDJSON)作為串流格式。
|
||
|
||
```
|
||
Content-Type: application/x-ndjson
|
||
|
||
{"type":"meta","job_id":"abc123","domains":["history","materials"],"cid_count":114892}
|
||
{"type":"domain_start","domain":"history","batch":1,"total_batches":23}
|
||
{"type":"records","domain":"history","batch":1,"data":[...5000 records...]}
|
||
{"type":"records","domain":"history","batch":2,"data":[...5000 records...]}
|
||
...
|
||
{"type":"domain_end","domain":"history","total_records":115000}
|
||
{"type":"domain_start","domain":"materials","batch":1,"total_batches":12}
|
||
...
|
||
{"type":"aggregation","data":{...}}
|
||
{"type":"complete","elapsed_seconds":285}
|
||
```
|
||
|
||
**env var**:`TRACE_STREAM_BATCH_SIZE`(預設 5000 records/batch)
|
||
|
||
**理由**:
|
||
- NDJSON 是業界標準串流 JSON 格式(Elasticsearch、BigQuery、GitHub API 都用)
|
||
- 每行是獨立 JSON,前端可逐行 parse(不需要等整個 response)
|
||
- 5000 records/batch ≈ 2-5MB,瀏覽器可即時渲染
|
||
- 與 HTTP/1.1 chunked transfer 完美搭配
|
||
|
||
### D2: EventFetcher iterator 模式
|
||
|
||
**決策**:新增 `fetch_events_iter()` 方法,yield 每批 grouped records。
|
||
|
||
```python
|
||
@staticmethod
|
||
def fetch_events_iter(container_ids, domain, batch_size=5000):
|
||
"""Yield dicts of {cid: [records]} in batches."""
|
||
# ... same SQL building logic ...
|
||
for oracle_batch_ids in batches:
|
||
for columns, rows in read_sql_df_slow_iter(sql, params):
|
||
batch_grouped = defaultdict(list)
|
||
for row in rows:
|
||
record = dict(zip(columns, row))
|
||
cid = record.get("CONTAINERID")
|
||
if cid:
|
||
batch_grouped[cid].append(record)
|
||
yield dict(batch_grouped)
|
||
```
|
||
|
||
**理由**:
|
||
- 與 `fetch_events()` 共存,不影響同步路徑
|
||
- 每次 yield 只持有一個 fetchmany batch 的 grouped 結果
|
||
- Worker 收到 yield 後立刻序列化寫出,不累積
|
||
|
||
### D3: 結果分頁 API
|
||
|
||
**決策**:提供 REST 分頁 API 作為 NDJSON 的替代方案。
|
||
|
||
```
|
||
GET /api/trace/job/{job_id}/result?domain=history&offset=0&limit=5000
|
||
```
|
||
|
||
**回應格式**:
|
||
```json
|
||
{
|
||
"domain": "history",
|
||
"offset": 0,
|
||
"limit": 5000,
|
||
"total": 115000,
|
||
"data": [... 5000 records ...]
|
||
}
|
||
```
|
||
|
||
**理由**:
|
||
- 某些 consumer(如外部系統)不支援 NDJSON 串流
|
||
- 分頁 API 是標準 REST pattern
|
||
- 結果仍儲存在 Redis(但按 domain 分 key),每個 key 5000 records ≈ 5MB
|
||
|
||
### D4: 前端 ReadableStream 消費
|
||
|
||
```javascript
|
||
async function consumeNDJSON(url, onChunk) {
|
||
const response = await fetch(url)
|
||
const reader = response.body.getReader()
|
||
const decoder = new TextDecoder()
|
||
let buffer = ''
|
||
|
||
while (true) {
|
||
const { done, value } = await reader.read()
|
||
if (done) break
|
||
buffer += decoder.decode(value, { stream: true })
|
||
const lines = buffer.split('\n')
|
||
buffer = lines.pop() // 保留不完整的最後一行
|
||
for (const line of lines) {
|
||
if (line.trim()) onChunk(JSON.parse(line))
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**理由**:
|
||
- ReadableStream 是瀏覽器原生 API,無需額外依賴
|
||
- 逐行 parse 記憶體使用恆定(只與 batch_size 成正比)
|
||
- 可邊收邊渲染,使用者體驗好
|
||
|
||
## Risks / Trade-offs
|
||
|
||
| 風險 | 緩解措施 |
|
||
|------|---------|
|
||
| NDJSON 不支援 HTTP 壓縮 | Flask 可配 gzip middleware;每行 5000 records 壓縮率高 |
|
||
| 中途斷線需重新開始 | 分頁 API 可從斷點繼續取;NDJSON 用於一次性消費 |
|
||
| 前端需要處理部分結果渲染 | 表格元件改用 virtual scroll(既有 vue-virtual-scroller) |
|
||
| MSD aggregation 仍需全量資料 | aggregation 在 worker 內部完成,只串流最終結果(較小) |
|
||
| 結果按 domain 分 key 增加 Redis key 數量 | TTL 清理 + key prefix 隔離 |
|