Trace pipeline pool isolation: - Switch event_fetcher and lineage_engine to read_sql_df_slow (non-pooled) - Reduce EVENT_FETCHER_MAX_WORKERS 4→2, TRACE_EVENTS_MAX_WORKERS 4→2 - Add 60s timeout per batch query, cache skip for CID>10K - Early del raw_domain_results + gc.collect() for large queries - Increase DB_SLOW_MAX_CONCURRENT: base 3→5, dev 2→3, prod 3→5 Test fixes (51 pre-existing failures → 0): - reject_history: WORKFLOW CSV header, strict bool validation, pareto mock path - portal shell: remove non-existent /tmtt-defect route from tests - conftest: add --run-stress option to skip stress/load tests by default - migration tests: skipif baseline directory missing - performance test: update Vite asset assertion - wip hold: add firstname/waferdesc mock params - template integration: add /reject-history canonical route Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1.7 KiB
1.7 KiB
Why
MSD trace pipeline 大範圍查詢(TMTT, 5 個月)產生 114K CIDs,event_fetcher 使用
read_sql_df(pool 連線)發送 ~230 條批次查詢,佔滿 connection pool,導致背景任務
(equipment cache、SYS_DATE)查詢時間從 1s 暴增到 500s,最終 Redis timeout +
gunicorn worker SIGKILL(2026-02-25 13:18 事件)。
What Changes
- event_fetcher 和 lineage_engine 改用
read_sql_df_slow(獨立連線),不佔用 pool - 降低
EVENT_FETCHER_MAX_WORKERS預設 4→2、TRACE_EVENTS_MAX_WORKERS預設 4→2,減少 Oracle 並行壓力 - 增加
DB_SLOW_MAX_CONCURRENTsemaphore 容量 3→5,容納 event_fetcher 批次查詢 - event_fetcher 在 CID 數量 >10K 時跳過 L1/L2 cache(避免數百 MB JSON 序列化導致 Redis timeout 和 heap 膨脹)
- trace_routes events endpoint 早期釋放
raw_domain_results並在大查詢後觸發gc.collect()
Capabilities
New Capabilities
(無新增 capability)
Modified Capabilities
event-fetcher-unified: 改用非 pool 連線 + 降低預設並行數 + 大 CID 集跳過快取lineage-engine-core: 改用非 pool 連線(不佔用 pool,避免與 event_fetcher 競爭)trace-staged-api: 降低 domain 並行數 + 早期記憶體釋放 + 大查詢跳過 route-level cacheruntime-resilience-recovery: slow query semaphore 容量增加以容納 trace pipeline 批次查詢
Impact
- 後端 services: event_fetcher.py, lineage_engine.py (import 切換)
- routes: trace_routes.py (並行數 + 記憶體管理)
- config: settings.py (DB_SLOW_MAX_CONCURRENT)
- 即時監控頁: 不受影響(繼續用 pool)
- 前端: 無修改