2 Commits

Author SHA1 Message Date
egg
f6a54f357f feat: cross-tool OOM protection — shared memory guard, Redis cache, RSS projection
Extract interactive memory guard from reject_dataset_cache into reusable
core/interactive_memory_guard.py (two-fence: DataFrame size + RSS projection).

Material Trace: add Redis query cache (TTL=5min), FETCH FIRST 50001 row limit,
upgrade memory guard to RSS projection, forced GC after batched queries.

Query Tool: add EventFetcher 500K row accumulation guard, RSS projection guard
for 6 heavy endpoints, GC after large responses.

Mid-Section Defect: upgrade RQ health check (Redis ping + worker existence +
60s TTL cache), add sync-path RSS guard with 503 response, add stampede lock
for events endpoint, extend analysis lock timeout 90→180s.

Fix SQL comment bind-parameter bug in 4 material_trace SQL templates where
`:p0, :p1` in comments were parsed as bind variables by SQLAlchemy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 07:34:34 +08:00
egg
dbe0da057c feat(trace-pipeline): memory triage, async job queue, and NDJSON streaming
Three proposals addressing the 2026-02-25 trace pipeline OOM crash (114K CIDs):

1. trace-events-memory-triage: fetchmany iterator (read_sql_df_slow_iter),
   admission control (50K CID limit for non-MSD), cache skip for large queries,
   early memory release with gc.collect()

2. trace-async-job-queue: RQ-based async jobs for queries >20K CIDs,
   separate worker process with isolated memory, frontend polling via
   useTraceProgress composable, systemd service + deploy scripts

3. trace-streaming-response: chunked Redis storage (TRACE_STREAM_BATCH_SIZE=5000),
   NDJSON stream endpoint (GET /api/trace/job/<id>/stream), frontend
   ReadableStream consumer for progressive rendering, backward-compatible
   with legacy single-key storage

All three proposals archived. 1101 tests pass, frontend builds clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 21:01:27 +08:00