feat(trace-pipeline): memory triage, async job queue, and NDJSON streaming

Three proposals addressing the 2026-02-25 trace pipeline OOM crash (114K CIDs): 1. trace-events-memory-triage: fetchmany iterator (read_sql_df_slow_iter), admission control (50K CID limit for non-MSD), cache skip for large queries, early memory release with gc.collect() 2. trace-async-job-queue: RQ-based async jobs for queries >20K CIDs, separate worker process with isolated memory, frontend polling via useTraceProgress composable, systemd service + deploy scripts 3. trace-streaming-response: chunked Redis storage (TRACE_STREAM_BATCH_SIZE=5000), NDJSON stream endpoint (GET /api/trace/job/<id>/stream), frontend ReadableStream consumer for progressive rendering, backward-compatible with legacy single-key storage All three proposals archived. 1101 tests pass, frontend builds clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 21:01:27 +08:00
parent cbb943dfe5
commit dbe0da057c
32 changed files with 3140 additions and 87 deletions
--- a/scripts/deploy.sh
+++ b/scripts/deploy.sh
@@ -242,9 +242,14 @@ show_next_steps() {
    echo "  sudo chmod 640 .env"
    echo "  sudo cp deploy/mes-dashboard.service /etc/systemd/system/"
    echo "  sudo cp deploy/mes-dashboard-watchdog.service /etc/systemd/system/"
+    echo "  sudo cp deploy/mes-dashboard-trace-worker.service /etc/systemd/system/"
    echo "  sudo systemctl daemon-reload"
    echo "  sudo systemctl enable --now mes-dashboard mes-dashboard-watchdog"
    echo ""
+    echo "Optional: enable async trace worker (for large queries)"
+    echo "  # Set TRACE_WORKER_ENABLED=true in .env"
+    echo "  sudo systemctl enable --now mes-dashboard-trace-worker"
+    echo ""
    echo "=========================================="
 }