feat(trace-pipeline): memory triage, async job queue, and NDJSON streaming
Three proposals addressing the 2026-02-25 trace pipeline OOM crash (114K CIDs): 1. trace-events-memory-triage: fetchmany iterator (read_sql_df_slow_iter), admission control (50K CID limit for non-MSD), cache skip for large queries, early memory release with gc.collect() 2. trace-async-job-queue: RQ-based async jobs for queries >20K CIDs, separate worker process with isolated memory, frontend polling via useTraceProgress composable, systemd service + deploy scripts 3. trace-streaming-response: chunked Redis storage (TRACE_STREAM_BATCH_SIZE=5000), NDJSON stream endpoint (GET /api/trace/job/<id>/stream), frontend ReadableStream consumer for progressive rendering, backward-compatible with legacy single-key storage All three proposals archived. 1101 tests pass, frontend builds clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
37
deploy/mes-dashboard-trace-worker.service
Normal file
37
deploy/mes-dashboard-trace-worker.service
Normal file
@@ -0,0 +1,37 @@
|
||||
[Unit]
|
||||
Description=MES Dashboard Trace Worker (RQ, Conda Runtime)
|
||||
Documentation=https://github.com/your-org/mes-dashboard
|
||||
After=network.target redis-server.service
|
||||
Requires=redis-server.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=www-data
|
||||
Group=www-data
|
||||
WorkingDirectory=/opt/mes-dashboard
|
||||
EnvironmentFile=-/opt/mes-dashboard/.env
|
||||
Environment="PYTHONPATH=/opt/mes-dashboard/src"
|
||||
|
||||
ExecStart=/usr/bin/env bash -lc 'exec "${CONDA_BIN:-/opt/miniconda3/bin/conda}" run --no-capture-output -n "${CONDA_ENV_NAME:-mes-dashboard}" rq worker "${TRACE_WORKER_QUEUE:-trace-events}" --url "${REDIS_URL:-redis://localhost:6379/0}"'
|
||||
|
||||
KillSignal=SIGTERM
|
||||
TimeoutStopSec=60
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
SyslogIdentifier=mes-dashboard-trace-worker
|
||||
|
||||
# Memory protection: trace worker handles large queries independently.
|
||||
# MemoryMax prevents single large job from killing the VM.
|
||||
MemoryHigh=3G
|
||||
MemoryMax=4G
|
||||
|
||||
NoNewPrivileges=yes
|
||||
PrivateTmp=yes
|
||||
ProtectSystem=strict
|
||||
ReadWritePaths=/opt/mes-dashboard/logs
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
Reference in New Issue
Block a user