Three proposals addressing the 2026-02-25 trace pipeline OOM crash (114K CIDs): 1. trace-events-memory-triage: fetchmany iterator (read_sql_df_slow_iter), admission control (50K CID limit for non-MSD), cache skip for large queries, early memory release with gc.collect() 2. trace-async-job-queue: RQ-based async jobs for queries >20K CIDs, separate worker process with isolated memory, frontend polling via useTraceProgress composable, systemd service + deploy scripts 3. trace-streaming-response: chunked Redis storage (TRACE_STREAM_BATCH_SIZE=5000), NDJSON stream endpoint (GET /api/trace/job/<id>/stream), frontend ReadableStream consumer for progressive rendering, backward-compatible with legacy single-key storage All three proposals archived. 1101 tests pass, frontend builds clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1.9 KiB
1.9 KiB
1. EventFetcher Iterator Mode
- 1.1 Add
fetch_events_iter(container_ids, domain, batch_size)static method toEventFetcherclass: yieldsDict[str, List[Dict]]batches usingread_sql_df_slow_iter - 1.2 Add unit tests for
fetch_events_iter(mock read_sql_df_slow_iter, verify batch yields)
2. NDJSON Stream Endpoint
- 2.1 Add
GET /api/trace/job/<job_id>/streamendpoint: returnsContent-Type: application/x-ndjsonwith FlaskResponse(generate(), mimetype='application/x-ndjson') - 2.2 Implement NDJSON generator: yield
meta→domain_start→recordsbatches →domain_end→aggregation→completelines - 2.3 Add
TRACE_STREAM_BATCH_SIZEenv var (default 5000) - 2.4 Modify
execute_trace_events_job()to store results in chunked Redis keys:trace:job:{job_id}:result:{domain}:{chunk_idx} - 2.5 Add unit tests for NDJSON stream endpoint
3. Result Pagination API
- 3.1 Enhance
GET /api/trace/job/<job_id>/resultwithdomain,offset,limitquery params - 3.2 Implement pagination over chunked Redis keys
- 3.3 Add unit tests for pagination (offset/limit boundary cases)
4. Frontend Streaming Consumer
- 4.1 Add
consumeNDJSONStream(url, onChunk)utility usingReadableStream - 4.2 Modify
useTraceProgress.js: for async jobs, prefer stream endpoint over full result endpoint - 4.3 Add progressive rendering: update table data as each NDJSON batch arrives
- 4.4 Add error handling: stream interruption, malformed NDJSON lines
5. Deployment
- 5.1 Update
.env.example: addTRACE_STREAM_BATCH_SIZEwith description
6. Verification
- 6.1 Run
python -m pytest tests/ -v— all existing tests pass - 6.2 Run
cd frontend && npm run build— frontend builds successfully - 6.3 Manual test: verify NDJSON stream produces valid output for multi-domain query