Three proposals addressing the 2026-02-25 trace pipeline OOM crash (114K CIDs): 1. trace-events-memory-triage: fetchmany iterator (read_sql_df_slow_iter), admission control (50K CID limit for non-MSD), cache skip for large queries, early memory release with gc.collect() 2. trace-async-job-queue: RQ-based async jobs for queries >20K CIDs, separate worker process with isolated memory, frontend polling via useTraceProgress composable, systemd service + deploy scripts 3. trace-streaming-response: chunked Redis storage (TRACE_STREAM_BATCH_SIZE=5000), NDJSON stream endpoint (GET /api/trace/job/<id>/stream), frontend ReadableStream consumer for progressive rendering, backward-compatible with legacy single-key storage All three proposals archived. 1101 tests pass, frontend builds clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
781 B
781 B
ADDED Requirements
Requirement: EventFetcher SHALL support iterator mode for streaming
EventFetcher.fetch_events_iter() SHALL yield batched results for streaming consumption.
Scenario: Iterator mode yields batches
- WHEN
fetch_events_iter(container_ids, domain, batch_size)is called - THEN it SHALL yield
Dict[str, List[Dict]]batches (grouped by CONTAINERID) - THEN each yielded batch SHALL contain results from one
cursor.fetchmany()call - THEN memory usage SHALL be proportional to
batch_size, not total result count
Scenario: Iterator mode cache behavior
- WHEN
fetch_events_iteris used for large CID sets (> CACHE_SKIP_CID_THRESHOLD) - THEN per-domain cache SHALL be skipped (consistent with
fetch_eventsbehavior)