feat(lineage): unified LineageEngine, EventFetcher, and progressive trace API

Introduce a unified Seed→Lineage→Event pipeline replacing per-page Python
BFS with Oracle CONNECT BY NOCYCLE queries, add staged /api/trace/*
endpoints with rate limiting and L2 Redis caching, and wire progressive
frontend loading via useTraceProgress composable.

Key changes:
- Add LineageEngine (split ancestors / merge sources / full genealogy)
  with QueryBuilder bind-param safety and batched IN clauses
- Add EventFetcher with 6-domain support and L2 Redis cache
- Add trace_routes Blueprint (seed-resolve, lineage, events) with
  profile dispatch, rate limiting, and Redis TTL=300s caching
- Refactor query_tool_service to use LineageEngine and QueryBuilder,
  removing raw string interpolation (SQL injection fix)
- Add rate limits and resolve cache to query_tool_routes
- Integrate useTraceProgress into mid-section-defect with skeleton
  placeholders and fade-in transitions
- Add lineageCache and on-demand lot lineage to query-tool
- Add TraceProgressBar shared component
- Remove legacy query-tool.js static script (3k lines)
- Fix MatrixTable package column truncation (.slice(0,15) removed)
- Archive unified-lineage-engine change, add trace-progressive-ui specs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
egg
2026-02-12 16:30:24 +08:00
parent c38b5f646a
commit 519f8ae2f4
52 changed files with 5074 additions and 4047 deletions

View File

@@ -0,0 +1,2 @@
schema: spec-driven
status: proposal

View File

@@ -0,0 +1,202 @@
## Context
兩個高查詢複雜度頁面(`/mid-section-defect``/query-tool`)各自實作了 LOT 血緣追溯邏輯。mid-section-defect 使用 Python BFS`_bfs_split_chain()` + `_fetch_merge_sources()`query-tool 使用 `_build_in_filter()` 字串拼接。兩者共用的底層資料表為 `DWH.DW_MES_CONTAINER`5.2M rows, CONTAINERID UNIQUE index`DWH.DW_MES_PJ_COMBINEDASSYLOTS`1.97M rows, FINISHEDNAME indexed
現行問題:
- BFS 每輪一次 DB round-trip3-16 輪),加上 `genealogy_records.sql` 全掃描 `HM_LOTMOVEOUT`48M rows
- `_build_in_filter()` 字串拼接存在 SQL injection 風險
- query-tool 無 rate limit / cache可打爆 DB pool (pool_size=10, max_overflow=20)
- 兩份 service 各 1200-1300 行,血緣邏輯重複
既有安全基礎設施:
- `QueryBuilder``sql/builder.py``add_in_condition()` 支援 bind params `:p0, :p1, ...`
- `SQLLoader``sql/loader.py``load_with_params()` 支援結構參數 `{{ PARAM }}`
- `configured_rate_limit()``core/rate_limit.py`per-client rate limit with `Retry-After` header
- `LayeredCache``core/cache.py`L1 MemoryTTL + L2 Redis
## Goals / Non-Goals
**Goals:**
-`CONNECT BY NOCYCLE` 取代 Python BFS將 3-16 次 DB round-trip 縮減為 1 次
- 建立 `LineageEngine` 統一模組,消除血緣邏輯重複
- 消除 `_build_in_filter()` SQL injection 風險
- 為 query-tool 加入 rate limit + cache對齊 mid-section-defect
-`lot_split_merge_history` 加入 fast/full 雙模式
**Non-Goals:**
- 不新增 API endpoint由後續 `trace-progressive-ui` 負責)
- 不改動前端
- 不建立 materialized view / 不使用 PARALLEL hints
- 不改動其他頁面wip-detail, lot-detail 等)
## Decisions
### D1: CONNECT BY NOCYCLE 作為主要遞迴查詢策略
**選擇**: Oracle `CONNECT BY NOCYCLE` with `LEVEL <= 20`
**替代方案**: Recursive `WITH` (recursive subquery factoring)
**理由**:
- `CONNECT BY` 是 Oracle 原生遞迴語法,在 Oracle 19c 上執行計劃最佳化最成熟
- `LEVEL <= 20` 等價於現行 BFS `bfs_round > 20` 防護
- `NOCYCLE` 處理循環引用(`SPLITFROMID` 可能存在資料錯誤的循環)
- recursive `WITH` 作為 SQL 檔案內的註解替代方案,若 execution plan 不佳可快速切換
**SQL 設計**`sql/lineage/split_ancestors.sql`:
```sql
SELECT
c.CONTAINERID,
c.SPLITFROMID,
c.CONTAINERNAME,
LEVEL AS SPLIT_DEPTH
FROM DWH.DW_MES_CONTAINER c
START WITH {{ CID_FILTER }}
CONNECT BY NOCYCLE PRIOR c.SPLITFROMID = c.CONTAINERID
AND LEVEL <= 20
```
- `{{ CID_FILTER }}``QueryBuilder.get_conditions_sql()` 生成bind params 注入
- Oracle IN clause 上限透過 `ORACLE_IN_BATCH_SIZE=1000` 分批,多批結果合併
### D2: LineageEngine 模組結構
```
src/mes_dashboard/services/lineage_engine.py
├── resolve_split_ancestors(container_ids: List[str]) -> Dict
│ └── 回傳 {child_to_parent: {cid: parent_cid}, cid_to_name: {cid: name}}
├── resolve_merge_sources(container_names: List[str]) -> Dict
│ └── 回傳 {finished_name: [{source_cid, source_name}]}
└── resolve_full_genealogy(container_ids: List[str], initial_names: Dict) -> Dict
└── 組合 split + merge回傳 {cid: Set[ancestor_cids]}
src/mes_dashboard/sql/lineage/
├── split_ancestors.sql (CONNECT BY NOCYCLE)
└── merge_sources.sql (from merge_lookup.sql)
```
**函數簽名設計**:
- profile-agnostic接受 `container_ids: List[str]`,不綁定頁面邏輯
- 回傳原生 Python 資料結構dict/set不回傳 DataFrame
- 內部使用 `QueryBuilder` + `SQLLoader.load_with_params()` + `read_sql_df()`
- batch 邏輯封裝在模組內caller 不需處理 `ORACLE_IN_BATCH_SIZE`
### D3: EventFetcher 模組結構
```
src/mes_dashboard/services/event_fetcher.py
├── fetch_events(container_ids: List[str], domain: str) -> List[Dict]
│ └── 支援 domain: history, materials, rejects, holds, jobs, upstream_history
├── _cache_key(domain: str, container_ids: List[str]) -> str
│ └── 格式: evt:{domain}:{sorted_cids_hash}
└── _get_rate_limit_config(domain: str) -> Dict
└── 回傳 {bucket, max_attempts, window_seconds}
```
**快取策略**:
- L2 Redis cache對齊 `core/cache.py` 模式TTL 依 domain 配置
- cache key 使用 `hashlib.md5(sorted(cids).encode()).hexdigest()[:12]` 避免超長 key
- mid-section-defect 既有的 `_fetch_upstream_history()` 遷移到 `fetch_events(cids, "upstream_history")`
### D4: query-tool SQL injection 修復策略
**修復範圍**6 個呼叫點):
1. `_resolve_by_lot_id()` (line 262): `_build_in_filter(lot_ids, 'CONTAINERNAME')` + `read_sql_df(sql, {})`
2. `_resolve_by_serial_number()` (line ~320): 同上模式
3. `_resolve_by_work_order()` (line ~380): 同上模式
4. `get_lot_history()` 內部的 IN 子句
5. `get_lot_associations()` 內部的 IN 子句
6. `lot_split_merge_history` 查詢
**修復模式**(統一):
```python
# Before (unsafe)
in_filter = _build_in_filter(lot_ids, 'CONTAINERNAME')
sql = f"SELECT ... WHERE {in_filter}"
df = read_sql_df(sql, {})
# After (safe)
builder = QueryBuilder()
builder.add_in_condition("CONTAINERNAME", lot_ids)
sql = SQLLoader.load_with_params(
"query_tool/lot_resolve_id",
CONTAINER_FILTER=builder.get_conditions_sql(),
)
df = read_sql_df(sql, builder.params)
```
**`_build_in_filter()``_build_in_clause()` 完全刪除**(非 deprecated直接刪除因為這是安全漏洞
### D5: query-tool rate limit + cache 配置
**Rate limit**(對齊 `configured_rate_limit()` 模式):
| Endpoint | Bucket | Max/Window | Env Override |
|----------|--------|------------|-------------|
| `/resolve` | `query-tool-resolve` | 10/60s | `QT_RESOLVE_RATE_*` |
| `/lot-history` | `query-tool-history` | 20/60s | `QT_HISTORY_RATE_*` |
| `/lot-associations` | `query-tool-association` | 20/60s | `QT_ASSOC_RATE_*` |
| `/adjacent-lots` | `query-tool-adjacent` | 20/60s | `QT_ADJACENT_RATE_*` |
| `/equipment-period` | `query-tool-equipment` | 5/60s | `QT_EQUIP_RATE_*` |
| `/export-csv` | `query-tool-export` | 3/60s | `QT_EXPORT_RATE_*` |
**Cache**:
- resolve result: L2 Redis, TTL=60s, key=`qt:resolve:{input_type}:{values_hash}`
- 其他 GET endpoints: 暫不加 cache結果依賴動態 CONTAINERID 參數cache 命中率低)
### D6: lot_split_merge_history fast/full 雙模式
**Fast mode**(預設):
```sql
-- lot_split_merge_history.sql 加入條件
AND h.TXNDATE >= ADD_MONTHS(SYSDATE, -6)
...
FETCH FIRST 500 ROWS ONLY
```
**Full mode**`full_history=true`:
- SQL variant 不含時間窗和 row limit
- 使用 `read_sql_df_slow()` (120s timeout) 取代 `read_sql_df()` (55s timeout)
- Route 層透過 `request.args.get('full_history', 'false').lower() == 'true'` 判斷
### D7: 重構順序與 regression 防護
**Phase 1**: mid-section-defect較安全有 cache + distributed lock 保護)
1. 建立 `lineage_engine.py` + SQL files
2.`mid_section_defect_service.py` 中以 `LineageEngine` 取代 BFS 三函數
3. golden test 驗證 BFS vs CONNECT BY 結果一致
4. 廢棄 `genealogy_records.sql` + `split_chain.sql`(標記 deprecated
**Phase 2**: query-tool風險較高無既有保護
1. 修復所有 `_build_in_filter()``QueryBuilder`
2. 刪除 `_build_in_filter()` + `_build_in_clause()`
3. 加入 route-level rate limit
4. 加入 resolve cache
5. 加入 `lot_split_merge_history` fast/full mode
**Phase 3**: EventFetcher
1. 建立 `event_fetcher.py`
2. 遷移 `_fetch_upstream_history()``EventFetcher`
3. 遷移 query-tool event fetch paths → `EventFetcher`
## Risks / Trade-offs
| Risk | Mitigation |
|------|-----------|
| CONNECT BY 對超大血緣樹 (>10000 nodes) 可能產生不預期的 execution plan | `LEVEL <= 20` 硬上限 + SQL 檔案內含 recursive `WITH` 替代方案可快速切換 |
| golden test 覆蓋率不足導致 regression 漏網 | 選取 ≥5 個已知血緣結構的 LOT含多層 split + merge 交叉CI gate 強制通過 |
| `_build_in_filter()` 刪除後漏改呼叫點 | Phase 2 完成後 `grep -r "_build_in_filter\|_build_in_clause" src/` 必須 0 結果 |
| fast mode 6 個月時間窗可能截斷需要完整歷史的追溯 | 提供 `full_history=true` 切換完整模式,前端預設不加此參數 = fast mode |
| QueryBuilder `add_in_condition()` 對 >1000 值不自動分批 | LineageEngine 內部封裝分批邏輯(`for i in range(0, len(ids), 1000)`),呼叫者無感 |
## Migration Plan
1. **建立新模組**`lineage_engine.py`, `event_fetcher.py`, `sql/lineage/*.sql` — 無副作用,可安全部署
2. **Phase 1 切換**mid-section-defect 內部呼叫改用 `LineageEngine` — 有 cache/lock 保護regression 可透過 golden test + 手動比對驗證
3. **Phase 2 切換**query-tool 修復 + rate limit + cache — 需重新跑 query-tool 路由測試
4. **Phase 3 切換**EventFetcher 遷移 — 最後執行,影響範圍最小
5. **清理**:確認 deprecated SQL files 無引用後刪除
**Rollback**: 每個 Phase 獨立,可單獨 revert。`LineageEngine``EventFetcher` 為新模組,不影響既有程式碼直到各 Phase 的切換 commit。
## Open Questions
- `DW_MES_CONTAINER.SPLITFROMID` 欄位是否有 index若無`CONNECT BY``START WITH` 性能可能依賴全表掃描而非 CONTAINERID index。需確認 Oracle execution plan。
- `ORACLE_IN_BATCH_SIZE=1000``CONNECT BY START WITH ... IN (...)` 的行為是否與普通 `WHERE ... IN (...)` 一致?需在開發環境驗證。
- EventFetcher 的 cache TTL 各 domain 是否需要差異化(如 `upstream_history` 較長、`holds` 較短)?暫統一 300s後續視使用模式調整。

View File

@@ -0,0 +1,110 @@
## Why
批次追蹤工具 (`/query-tool`) 與中段製程不良追溯分析 (`/mid-section-defect`) 是本專案中查詢複雜度最高的兩個頁面。兩者都需要解析 LOT 血緣關係(拆批 split + 併批 merge但各自實作了獨立的追溯邏輯導致
1. **效能瓶頸**mid-section-defect 使用 Python 多輪 BFS 追溯 split chain`_bfs_split_chain()`,每次 3-16 次 DB round-trip加上 `genealogy_records.sql` 對 48M 行的 `HM_LOTMOVEOUT` 全表掃描30-120 秒)。
2. **安全風險**query-tool 的 `_build_in_filter()` 使用字串拼接建構 IN 子句(`query_tool_service.py:156-174``_resolve_by_lot_id()` / `_resolve_by_serial_number()` / `_resolve_by_work_order()` 系列函數傳入空 params `read_sql_df(sql, {})`——值直接嵌入 SQL 字串中,存在 SQL 注入風險。
3. **缺乏防護**query-tool 無 rate limit、無 cache高併發時可打爆 DB connection poolProduction pool_size=10, max_overflow=20
4. **重複程式碼**:兩個 service 各自維護 split chain 追溯、merge lookup、batch IN 分段等相同邏輯。
Oracle 19c 的 `CONNECT BY NOCYCLE` 可以用一條 SQL 取代整套 Python BFS將 3-16 次 DB round-trip 縮減為 1 次。備選方案為 Oracle 19c 支援的 recursive `WITH` (recursive subquery factoring)功能等價但可讀性更好。split/merge 的資料來源 (`DW_MES_CONTAINER.SPLITFROMID` + `DW_MES_PJ_COMBINEDASSYLOTS`) 完全不需碰 `HM_LOTMOVEOUT`,可消除 48M 行全表掃描。
**邊界聲明**:本變更為純後端內部重構,不新增任何 API endpoint不改動前端。既有 API contract 向下相容URL、request/response 格式不變),僅新增可選的 `full_history` query param 作為向下相容擴展。後續的前端分段載入和新增 API endpoints 列入獨立的 `trace-progressive-ui` 變更。
## What Changes
- 建立統一的 `LineageEngine` 模組(`src/mes_dashboard/services/lineage_engine.py`),提供 LOT 血緣解析共用核心:
- `resolve_split_ancestors()` — 使用 `CONNECT BY NOCYCLE` 單次 SQL 查詢取代 Python BFS備選: recursive `WITH`,於 SQL 檔案中以註解標註替代寫法)
- `resolve_merge_sources()` — 從 `DW_MES_PJ_COMBINEDASSYLOTS` 查詢併批來源
- `resolve_full_genealogy()` — 組合 split + merge 為完整血緣圖
- 設計為 profile-agnostic 的公用函數未來其他頁面wip-detail、lot-detail可直接呼叫但本變更僅接入 mid-section-defect 和 query-tool
- 建立統一的 `EventFetcher` 模組,提供帶 cache + rate limit 的批次事件查詢,封裝既有的 domain 查詢history、materials、rejects、holds、jobs、upstream_history
- 重構 `mid_section_defect_service.py`:以 `LineageEngine` 取代 `_bfs_split_chain()` + `_fetch_merge_sources()` + `_resolve_full_genealogy()`;以 `EventFetcher` 取代 `_fetch_upstream_history()`
- 重構 `query_tool_service.py`:以 `QueryBuilder` bind params 全面取代 `_build_in_filter()` 字串拼接;加入 route-level rate limit 和 cache 對齊 mid-section-defect 既有模式。
- 新增 SQL 檔案:
- `sql/lineage/split_ancestors.sql`CONNECT BY NOCYCLE 實作,檔案內包含 recursive WITH 替代寫法作為 Oracle 版本兼容備註)
- `sql/lineage/merge_sources.sql`(從 `sql/mid_section_defect/merge_lookup.sql` 遷移)
- 廢棄 SQL 檔案(標記 deprecated保留一個版本後刪除
- `sql/mid_section_defect/genealogy_records.sql`48M row HM_LOTMOVEOUT 全掃描不再需要)
- `sql/mid_section_defect/split_chain.sql`(由 lineage CONNECT BY 取代)
- 為 query-tool 的 `lot_split_merge_history.sql` 加入雙模式查詢:
- **fast mode**(預設):`TXNDATE >= ADD_MONTHS(SYSDATE, -6)` + `FETCH FIRST 500 ROWS ONLY`——涵蓋近半年追溯,回應 <5s
- **full mode**前端傳入 `full_history=true` 時不加時間窗保留完整歷史追溯能力 `read_sql_df_slow` (120s timeout)
- query-tool route 新增 `full_history` boolean query paramservice 依此選擇 SQL variant
## Capabilities
### New Capabilities
- `lineage-engine-core`: 統一 LOT 血緣解析引擎提供 `resolve_split_ancestors()`CONNECT BY NOCYCLE`LEVEL <= 20` 上限)、`resolve_merge_sources()``resolve_full_genealogy()` 三個公用函數全部使用 `QueryBuilder` bind params支援批次 IN 分段`ORACLE_IN_BATCH_SIZE=1000`)。函數簽名設計為 profile-agnostic接受 `container_ids: List[str]` 並回傳字典結構不綁定特定頁面邏輯
- `event-fetcher-unified`: 統一事件查詢層封裝 cache key 生成格式: `evt:{domain}:{sorted_cids_hash}`)、L1/L2 layered cache對齊 `core/cache.py` LayeredCache 模式)、rate limit bucket 配置對齊 `configured_rate_limit()` 模式)。domain 包含 `history``materials``rejects``holds``jobs``upstream_history`
- `query-tool-safety-hardening`: 修復 query-tool SQL 注入風險——`_build_in_filter()` `_build_in_clause()` 全面改用 `QueryBuilder.add_in_condition()`消除 `read_sql_df(sql, {})` params 模式加入 route-level rate limit對齊 `configured_rate_limit()` 模式resolve 10/min, history 20/min, association 20/min response cacheL2 Redis, 60s TTL)。
### Modified Capabilities
- `cache-indexed-query-acceleration`: mid-section-defect genealogy 查詢從 Python BFS 多輪 + HM_LOTMOVEOUT 全掃描改為 CONNECT BY 單輪 + 索引查詢
- `oracle-query-fragment-governance`: `_build_in_filter()` / `_build_in_clause()` 廢棄統一收斂到 `QueryBuilder.add_in_condition()`新增 `sql/lineage/` 目錄遵循既有 SQLLoader 慣例
## Impact
- **Affected code**:
- 新建: `src/mes_dashboard/services/lineage_engine.py`, `src/mes_dashboard/sql/lineage/split_ancestors.sql`, `src/mes_dashboard/sql/lineage/merge_sources.sql`
- 重構: `src/mes_dashboard/services/mid_section_defect_service.py` (1194L), `src/mes_dashboard/services/query_tool_service.py` (1329L), `src/mes_dashboard/routes/query_tool_routes.py`
- 廢棄: `src/mes_dashboard/sql/mid_section_defect/genealogy_records.sql`, `src/mes_dashboard/sql/mid_section_defect/split_chain.sql` ( lineage 模組取代標記 deprecated 保留一版)
- 修改: `src/mes_dashboard/sql/query_tool/lot_split_merge_history.sql` (加時間窗 + row limit)
- **Runtime/deploy**: 無新依賴仍為 Flask/Gunicorn + Oracle + RedisDB query pattern 改變但 connection pool 設定不變
- **APIs/pages**: `/query-tool` `/mid-section-defect` 既有 API contract 向下相容——URL輸入輸出格式HTTP status code 均不變純內部實作替換向下相容的擴展query-tool API 新增 rate limit header`Retry-After`對齊 `rate_limit.py` 既有實作query-tool split-merge history 新增可選 `full_history` query param預設 false = fast mode不傳時行為與舊版等價)。
- **Performance**: 見下方 Verification 章節的量化驗收基準
- **Security**: query-tool IN clause SQL injection 風險消除所有 `_build_in_filter()` / `_build_in_clause()` 呼叫點改為 `QueryBuilder.add_in_condition()`
- **Testing**: 需新增 LineageEngine 單元測試並建立 golden test 比對 BFS vs CONNECT BY 結果一致性既有 mid-section-defect query-tool 測試需更新 mock 路徑
## Verification
效能驗收基準——所有指標須在以下條件下量測
**測試資料規模**:
- LOT 血緣樹: 目標 seed lot 具備 3 split depth、≥50 ancestor nodes至少 1 merge path
- mid-section-defect: 使用 TMTT detection 產出 10 seed lots 的日期範圍查詢
- query-tool: resolve 結果 20 lots work order 查詢
**驗收指標**冷查詢 = cache miss熱查詢 = L2 Redis hit:
| 指標 | 現況 (P95) | 目標 (P95) | 條件 |
|------|-----------|-----------|------|
| mid-section-defect genealogy | 30-120s | 8s | CONNECT BY 單輪,≥50 ancestor nodes |
| mid-section-defect genealogy | 3-5s (L2 hit) | 1s | Redis cache hit |
| query-tool lot_split_merge_history fast mode | 無上限 >120s timeout | ≤5s | 時間窗 6 個月 + FETCH FIRST 500 ROWS |
| query-tool lot_split_merge_history full mode | 同上 | ≤60s | 無時間窗,走 `read_sql_df_slow` 120s timeout |
| LineageEngine.resolve_split_ancestors | N/A (新模組) | ≤3s | ≥50 ancestor nodes, CONNECT BY |
| DB connection 佔用時間 | 3-16 round-trips × 0.5-2s each | 單次 ≤3s | 單一 CONNECT BY 查詢 |
**安全驗收**:
- `_build_in_filter()``_build_in_clause()` 零引用grep 確認)
- 所有含使用者輸入的查詢resolve_by_lot_id, resolve_by_serial_number, resolve_by_work_order 等)必須使用 `QueryBuilder` bind params不可字串拼接。純靜態 SQL無使用者輸入允許空 params
**結果一致性驗收**:
- Golden test: 選取 ≥5 個已知血緣結構的 LOT比對 BFS vs CONNECT BY 輸出的 `child_to_parent``cid_to_name` 結果集合完全一致
## Non-Goals
- 前端 UI 改動不在此變更範圍內(前端分段載入和漸進式 UX 列入後續 `trace-progressive-ui` 變更)。
- 不新增任何 API endpoint——既有 API contract 向下相容(僅新增可選 query param `full_history` 作為擴展)。新增 endpoint 由後續 `trace-progressive-ui` 負責。
- 不改動 DB schema、不建立 materialized view、不使用 PARALLEL hints——所有最佳化在應用層SQL 改寫 + Python 重構 + Redis cache完成。
- 不改動其他頁面wip-detail、lot-detail 等)的查詢邏輯——`LineageEngine` 設計為可擴展,但本變更僅接入兩個目標頁面。
- 不使用 Oracle PARALLEL hints在 connection pool 環境下行為不可預測,不做為最佳化手段)。
## Dependencies
- 無前置依賴。本變更可獨立實施。
- 後續 `trace-progressive-ui` 依賴本變更完成後的 `LineageEngine``EventFetcher` 模組。
## Risks
| 風險 | 緩解 |
|------|------|
| CONNECT BY 遇超大血緣樹(>10000 ancestors效能退化 | `LEVEL <= 20` 上限 + `NOCYCLE` 防循環;與目前 BFS `bfs_round > 20` 等效。若 Oracle 19c 執行計劃不佳SQL 檔案內含 recursive `WITH` 替代寫法可快速切換 |
| 血緣結果與 BFS 版本不一致regression | 建立 golden test用 ≥5 個已知 LOT 比對 BFS vs CONNECT BY 輸出CI gate 確保結果集合完全一致 |
| 重構範圍橫跨兩個大 service2500+ 行) | 分階段:先重構 mid-section-defect有 cache+lock 保護regression 風險較低),再做 query-tool |
| `genealogy_records.sql` 廢棄後遺漏引用 | grep 全域搜索確認無其他引用點SQL file 標記 deprecated 保留一個版本後刪除 |
| query-tool 新增 rate limit 影響使用者體驗 | 預設值寬鬆resolve 10/min, history 20/min與 mid-section-defect 既有 rate limit 對齊,回應包含 `Retry-After` header |
| `QueryBuilder` 取代 `_build_in_filter()` 時漏改呼叫點 | grep 搜索 `_build_in_filter``_build_in_clause` 所有引用,逐一替換並確認 0 殘留引用 |

View File

@@ -0,0 +1,18 @@
## ADDED Requirements
### Requirement: Mid-section defect genealogy SHALL use CONNECT BY instead of Python BFS
The mid-section-defect genealogy resolution SHALL use `LineageEngine.resolve_full_genealogy()` (CONNECT BY NOCYCLE) instead of the existing `_bfs_split_chain()` Python BFS implementation.
#### Scenario: Genealogy cold query performance
- **WHEN** mid-section-defect analysis executes genealogy resolution with cache miss
- **THEN** `LineageEngine.resolve_split_ancestors()` SHALL be called (single CONNECT BY query)
- **THEN** response time SHALL be ≤8s (P95) for ≥50 ancestor nodes
- **THEN** Python BFS `_bfs_split_chain()` SHALL NOT be called
#### Scenario: Genealogy hot query performance
- **WHEN** mid-section-defect analysis executes genealogy resolution with L2 Redis cache hit
- **THEN** response time SHALL be ≤1s (P95)
#### Scenario: Golden test result equivalence
- **WHEN** golden test runs with ≥5 known LOTs
- **THEN** CONNECT BY output (`child_to_parent`, `cid_to_name`) SHALL be identical to BFS output for the same inputs

View File

@@ -0,0 +1,20 @@
## ADDED Requirements
### Requirement: EventFetcher SHALL provide unified cached event querying across domains
`EventFetcher` SHALL encapsulate batch event queries with L1/L2 layered cache and rate limit bucket configuration, supporting domains: `history`, `materials`, `rejects`, `holds`, `jobs`, `upstream_history`.
#### Scenario: Cache miss for event domain query
- **WHEN** `EventFetcher` is called for a domain with container IDs and no cache exists
- **THEN** the domain query SHALL execute against Oracle via `read_sql_df()`
- **THEN** the result SHALL be stored in L2 Redis cache with key format `evt:{domain}:{sorted_cids_hash}`
- **THEN** L1 memory cache SHALL also be populated (aligned with `core/cache.py` LayeredCache pattern)
#### Scenario: Cache hit for event domain query
- **WHEN** `EventFetcher` is called for a domain and L2 Redis cache contains a valid entry
- **THEN** the cached result SHALL be returned without executing Oracle query
- **THEN** DB connection pool SHALL NOT be consumed
#### Scenario: Rate limit bucket per domain
- **WHEN** `EventFetcher` is used from a route handler
- **THEN** each domain SHALL have a configurable rate limit bucket aligned with `configured_rate_limit()` pattern
- **THEN** rate limit configuration SHALL be overridable via environment variables

View File

@@ -0,0 +1,57 @@
## ADDED Requirements
### Requirement: LineageEngine SHALL provide unified split ancestor resolution via CONNECT BY NOCYCLE
`LineageEngine.resolve_split_ancestors()` SHALL accept a list of container IDs and return the complete split ancestry graph using a single Oracle `CONNECT BY NOCYCLE` query on `DW_MES_CONTAINER.SPLITFROMID`.
#### Scenario: Normal split chain resolution
- **WHEN** `resolve_split_ancestors()` is called with a list of container IDs
- **THEN** a single SQL query using `CONNECT BY NOCYCLE` SHALL be executed against `DW_MES_CONTAINER`
- **THEN** the result SHALL include a `child_to_parent` mapping and a `cid_to_name` mapping for all discovered ancestor nodes
- **THEN** the traversal depth SHALL be limited to `LEVEL <= 20` (equivalent to existing BFS `bfs_round > 20` guard)
#### Scenario: Large input batch exceeding Oracle IN clause limit
- **WHEN** the input `container_ids` list exceeds `ORACLE_IN_BATCH_SIZE` (1000)
- **THEN** `QueryBuilder.add_in_condition()` SHALL batch the IDs and combine results
- **THEN** all bind parameters SHALL use `QueryBuilder.params` (no string concatenation)
#### Scenario: Cyclic split references in data
- **WHEN** `DW_MES_CONTAINER.SPLITFROMID` contains cyclic references
- **THEN** `NOCYCLE` SHALL prevent infinite traversal
- **THEN** the query SHALL return all non-cyclic ancestors up to `LEVEL <= 20`
#### Scenario: CONNECT BY performance regression
- **WHEN** Oracle 19c execution plan for `CONNECT BY NOCYCLE` performs worse than expected
- **THEN** the SQL file SHALL contain a commented-out recursive `WITH` (recursive subquery factoring) alternative that can be swapped in without code changes
### Requirement: LineageEngine SHALL provide unified merge source resolution
`LineageEngine.resolve_merge_sources()` SHALL accept a list of container IDs and return merge source mappings from `DW_MES_PJ_COMBINEDASSYLOTS`.
#### Scenario: Merge source lookup
- **WHEN** `resolve_merge_sources()` is called with container IDs
- **THEN** the result SHALL include `{cid: [merge_source_cid, ...]}` for all containers that have merge sources
- **THEN** all queries SHALL use `QueryBuilder` bind params
### Requirement: LineageEngine SHALL provide combined genealogy resolution
`LineageEngine.resolve_full_genealogy()` SHALL combine split ancestors and merge sources into a complete genealogy graph.
#### Scenario: Full genealogy for a set of seed lots
- **WHEN** `resolve_full_genealogy()` is called with seed container IDs
- **THEN** split ancestors SHALL be resolved first via `resolve_split_ancestors()`
- **THEN** merge sources SHALL be resolved for all discovered ancestor nodes
- **THEN** the combined result SHALL be equivalent to the existing `_resolve_full_genealogy()` output in `mid_section_defect_service.py`
### Requirement: LineageEngine functions SHALL be profile-agnostic
All `LineageEngine` public functions SHALL accept `container_ids: List[str]` and return dictionary structures without binding to any specific page logic.
#### Scenario: Reuse from different pages
- **WHEN** a new page (e.g., wip-detail) needs lineage resolution
- **THEN** it SHALL be able to call `LineageEngine` functions directly without modification
- **THEN** no page-specific logic (profile, TMTT detection, etc.) SHALL exist in `LineageEngine`
### Requirement: LineageEngine SQL files SHALL reside in `sql/lineage/` directory
New SQL files SHALL follow the existing `SQLLoader` convention under `src/mes_dashboard/sql/lineage/`.
#### Scenario: SQL file organization
- **WHEN** `LineageEngine` executes queries
- **THEN** `split_ancestors.sql` and `merge_sources.sql` SHALL be loaded via `SQLLoader.load_with_params("lineage/split_ancestors", ...)`
- **THEN** the SQL files SHALL NOT reference `HM_LOTMOVEOUT` (48M row table no longer needed for genealogy)

View File

@@ -0,0 +1,23 @@
## ADDED Requirements
### Requirement: Lineage SQL fragments SHALL be centralized in `sql/lineage/` directory
Split ancestor and merge source SQL queries SHALL be defined in `sql/lineage/` and shared across services via `SQLLoader`.
#### Scenario: Mid-section-defect lineage query
- **WHEN** `mid_section_defect_service.py` needs split ancestry or merge source data
- **THEN** it SHALL call `LineageEngine` which loads SQL from `sql/lineage/split_ancestors.sql` and `sql/lineage/merge_sources.sql`
- **THEN** it SHALL NOT use `sql/mid_section_defect/split_chain.sql` or `sql/mid_section_defect/genealogy_records.sql`
#### Scenario: Deprecated SQL file handling
- **WHEN** `sql/mid_section_defect/genealogy_records.sql` and `sql/mid_section_defect/split_chain.sql` are deprecated
- **THEN** the files SHALL be marked with a deprecated comment at the top
- **THEN** grep SHALL confirm zero `SQLLoader.load` references to these files
- **THEN** the files SHALL be retained for one version before deletion
### Requirement: All user-input SQL queries SHALL use QueryBuilder bind params
`_build_in_filter()` and `_build_in_clause()` in `query_tool_service.py` SHALL be fully replaced by `QueryBuilder.add_in_condition()`.
#### Scenario: Complete migration to QueryBuilder
- **WHEN** the refactoring is complete
- **THEN** grep for `_build_in_filter` and `_build_in_clause` SHALL return zero results
- **THEN** all queries involving user-supplied values SHALL use `QueryBuilder.params`

View File

@@ -0,0 +1,57 @@
## ADDED Requirements
### Requirement: query-tool resolve functions SHALL use QueryBuilder bind params for all user input
All `resolve_lots()` family functions (`_resolve_by_lot_id`, `_resolve_by_serial_number`, `_resolve_by_work_order`) SHALL use `QueryBuilder.add_in_condition()` with bind parameters instead of `_build_in_filter()` string concatenation.
#### Scenario: Lot resolve with user-supplied values
- **WHEN** a resolve function receives user-supplied lot IDs, serial numbers, or work order names
- **THEN** the SQL query SHALL use `:p0, :p1, ...` bind parameters via `QueryBuilder`
- **THEN** `read_sql_df()` SHALL receive `builder.params` (never an empty `{}` dict for queries with user input)
- **THEN** `_build_in_filter()` and `_build_in_clause()` SHALL NOT be called
#### Scenario: Pure static SQL without user input
- **WHEN** a query contains no user-supplied values (e.g., static lookups)
- **THEN** empty params `{}` is acceptable
- **THEN** no `_build_in_filter()` SHALL be used
#### Scenario: Zero residual references to deprecated functions
- **WHEN** the refactoring is complete
- **THEN** grep for `_build_in_filter` and `_build_in_clause` SHALL return zero results across the entire codebase
### Requirement: query-tool routes SHALL apply rate limiting
All query-tool API endpoints SHALL apply per-client rate limiting using the existing `configured_rate_limit` mechanism.
#### Scenario: Resolve endpoint rate limit exceeded
- **WHEN** a client sends more than 10 requests to query-tool resolve endpoints within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
- **THEN** the resolve service function SHALL NOT be called
#### Scenario: History endpoint rate limit exceeded
- **WHEN** a client sends more than 20 requests to query-tool history endpoints within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
#### Scenario: Association endpoint rate limit exceeded
- **WHEN** a client sends more than 20 requests to query-tool association endpoints within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
### Requirement: query-tool routes SHALL apply response caching
High-cost query-tool endpoints SHALL cache responses in L2 Redis.
#### Scenario: Resolve result caching
- **WHEN** a resolve request succeeds
- **THEN** the response SHALL be cached in L2 Redis with TTL = 60s
- **THEN** subsequent identical requests within TTL SHALL return cached result without Oracle query
### Requirement: lot_split_merge_history SHALL support fast and full query modes
The `lot_split_merge_history.sql` query SHALL support two modes to balance traceability completeness vs performance.
#### Scenario: Fast mode (default)
- **WHEN** `full_history` query parameter is absent or `false`
- **THEN** the SQL SHALL include `TXNDATE >= ADD_MONTHS(SYSDATE, -6)` time window and `FETCH FIRST 500 ROWS ONLY`
- **THEN** query response time SHALL be ≤5s (P95)
#### Scenario: Full mode
- **WHEN** `full_history=true` query parameter is provided
- **THEN** the SQL SHALL NOT include time window restriction
- **THEN** the query SHALL use `read_sql_df_slow` (120s timeout)
- **THEN** query response time SHALL be ≤60s (P95)

View File

@@ -0,0 +1,57 @@
## Phase 1: LineageEngine 模組建立
- [x] 1.1 建立 `src/mes_dashboard/sql/lineage/split_ancestors.sql`CONNECT BY NOCYCLE含 recursive WITH 註解替代方案)
- [x] 1.2 建立 `src/mes_dashboard/sql/lineage/merge_sources.sql`(從 `mid_section_defect/merge_lookup.sql` 遷移,改用 `{{ FINISHED_NAME_FILTER }}` 結構參數)
- [x] 1.3 建立 `src/mes_dashboard/services/lineage_engine.py``resolve_split_ancestors()``resolve_merge_sources()``resolve_full_genealogy()` 三個公用函數,使用 `QueryBuilder` bind params + `ORACLE_IN_BATCH_SIZE=1000` 分批
- [x] 1.4 LineageEngine 單元測試mock `read_sql_df` 驗證 batch 分割、dict 回傳結構、LEVEL <= 20 防護
## Phase 2: mid-section-defect 切換到 LineageEngine
- [x] 2.1 在 `mid_section_defect_service.py` 中以 `LineageEngine.resolve_split_ancestors()` 取代 `_bfs_split_chain()`
- [x] 2.2 以 `LineageEngine.resolve_merge_sources()` 取代 `_fetch_merge_sources()`
- [x] 2.3 以 `LineageEngine.resolve_full_genealogy()` 取代 `_resolve_full_genealogy()`
- [x] 2.4 Golden test選取 ≥5 個已知血緣結構 LOT比對 BFS vs CONNECT BY 輸出的 `child_to_parent``cid_to_name` 結果集合完全一致
- [x] 2.5 標記 `sql/mid_section_defect/genealogy_records.sql``sql/mid_section_defect/split_chain.sql` 為 deprecated檔案頂部加 `-- DEPRECATED: replaced by sql/lineage/split_ancestors.sql`
## Phase 3: query-tool SQL injection 修復
- [x] 3.1 建立 `sql/query_tool/lot_resolve_id.sql``lot_resolve_serial.sql``lot_resolve_work_order.sql` SQL 檔案(從 inline SQL 遷移到 SQLLoader 管理)
- [x] 3.2 修復 `_resolve_by_lot_id()``_build_in_filter()``QueryBuilder.add_in_condition()` + `SQLLoader.load_with_params()` + `read_sql_df(sql, builder.params)`
- [x] 3.3 修復 `_resolve_by_serial_number()`:同上模式
- [x] 3.4 修復 `_resolve_by_work_order()`:同上模式
- [x] 3.5 修復 `get_lot_history()` 內部 IN 子句:改用 `QueryBuilder`
- [x] 3.6 修復 lot-associations 查詢路徑(`get_lot_materials()` / `get_lot_rejects()` / `get_lot_holds()` / `get_lot_splits()` / `get_lot_jobs()`)中涉及使用者輸入的 IN 子句:改用 `QueryBuilder`
- [x] 3.7 修復 `lot_split_merge_history` 查詢:改用 `QueryBuilder`
- [x] 3.8 刪除 `_build_in_filter()``_build_in_clause()` 函數
- [x] 3.9 驗證:`grep -r "_build_in_filter\|_build_in_clause" src/` 回傳 0 結果
- [x] 3.10 更新既有 query-tool 路由測試的 mock 路徑
## Phase 4: query-tool rate limit + cache
- [x] 4.1 在 `query_tool_routes.py``/resolve` 加入 `configured_rate_limit(bucket='query-tool-resolve', default_max_attempts=10, default_window_seconds=60)`
- [x] 4.2 為 `/lot-history` 加入 `configured_rate_limit(bucket='query-tool-history', default_max_attempts=20, default_window_seconds=60)`
- [x] 4.3 為 `/lot-associations` 加入 `configured_rate_limit(bucket='query-tool-association', default_max_attempts=20, default_window_seconds=60)`
- [x] 4.4 為 `/adjacent-lots` 加入 `configured_rate_limit(bucket='query-tool-adjacent', default_max_attempts=20, default_window_seconds=60)`
- [x] 4.5 為 `/equipment-period` 加入 `configured_rate_limit(bucket='query-tool-equipment', default_max_attempts=5, default_window_seconds=60)`
- [x] 4.6 為 `/export-csv` 加入 `configured_rate_limit(bucket='query-tool-export', default_max_attempts=3, default_window_seconds=60)`
- [x] 4.7 為 resolve 結果加入 L2 Redis cachekey=`qt:resolve:{input_type}:{values_hash}`, TTL=60s
## Phase 5: lot_split_merge_history fast/full 雙模式
- [x] 5.1 修改 `sql/query_tool/lot_split_merge_history.sql`:加入 `{{ TIME_WINDOW }}``{{ ROW_LIMIT }}` 結構參數
- [x] 5.2 在 `query_tool_service.py` 中根據 `full_history` 參數選擇 SQL variantfast: `AND h.TXNDATE >= ADD_MONTHS(SYSDATE, -6)` + `FETCH FIRST 500 ROWS ONLY`full: 無限制 + `read_sql_df_slow`
- [x] 5.3 在 `query_tool_routes.py``/api/query-tool/lot-associations?type=splits` 路徑加入 `full_history` query param 解析,並傳遞到 split-merge-history 查詢
- [x] 5.4 路由測試:驗證 fast mode預設和 full mode`full_history=true`)的行為差異
## Phase 6: EventFetcher 模組建立
- [x] 6.1 建立 `src/mes_dashboard/services/event_fetcher.py``fetch_events(container_ids, domain)` + cache key 生成 + rate limit config
- [x] 6.2 遷移 `mid_section_defect_service.py``_fetch_upstream_history()``EventFetcher.fetch_events(cids, "upstream_history")`
- [x] 6.3 遷移 query-tool event fetch paths 到 `EventFetcher``get_lot_history``get_lot_associations` 的 DB 查詢部分)
- [x] 6.4 EventFetcher 單元測試mock DB 驗證 cache key 格式、rate limit config、domain 分支
## Phase 7: 清理與驗證
- [x] 7.1 確認 `genealogy_records.sql``split_chain.sql` 無活躍引用(`grep -r` 確認),保留 deprecated 標記
- [x] 7.2 確認所有含使用者輸入的查詢使用 `QueryBuilder` bind paramsgrep `read_sql_df` 呼叫點逐一確認)
- [x] 7.3 執行完整 query-tool 和 mid-section-defect 路由測試,確認無 regression