feat(lineage): unified LineageEngine, EventFetcher, and progressive trace API

Introduce a unified Seed→Lineage→Event pipeline replacing per-page Python
BFS with Oracle CONNECT BY NOCYCLE queries, add staged /api/trace/*
endpoints with rate limiting and L2 Redis caching, and wire progressive
frontend loading via useTraceProgress composable.

Key changes:
- Add LineageEngine (split ancestors / merge sources / full genealogy)
  with QueryBuilder bind-param safety and batched IN clauses
- Add EventFetcher with 6-domain support and L2 Redis cache
- Add trace_routes Blueprint (seed-resolve, lineage, events) with
  profile dispatch, rate limiting, and Redis TTL=300s caching
- Refactor query_tool_service to use LineageEngine and QueryBuilder,
  removing raw string interpolation (SQL injection fix)
- Add rate limits and resolve cache to query_tool_routes
- Integrate useTraceProgress into mid-section-defect with skeleton
  placeholders and fade-in transitions
- Add lineageCache and on-demand lot lineage to query-tool
- Add TraceProgressBar shared component
- Remove legacy query-tool.js static script (3k lines)
- Fix MatrixTable package column truncation (.slice(0,15) removed)
- Archive unified-lineage-engine change, add trace-progressive-ui specs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
egg
2026-02-12 16:30:24 +08:00
parent c38b5f646a
commit 519f8ae2f4
52 changed files with 5074 additions and 4047 deletions

View File

@@ -0,0 +1,2 @@
schema: spec-driven
status: proposal

View File

@@ -0,0 +1,202 @@
## Context
兩個高查詢複雜度頁面(`/mid-section-defect``/query-tool`)各自實作了 LOT 血緣追溯邏輯。mid-section-defect 使用 Python BFS`_bfs_split_chain()` + `_fetch_merge_sources()`query-tool 使用 `_build_in_filter()` 字串拼接。兩者共用的底層資料表為 `DWH.DW_MES_CONTAINER`5.2M rows, CONTAINERID UNIQUE index`DWH.DW_MES_PJ_COMBINEDASSYLOTS`1.97M rows, FINISHEDNAME indexed
現行問題:
- BFS 每輪一次 DB round-trip3-16 輪),加上 `genealogy_records.sql` 全掃描 `HM_LOTMOVEOUT`48M rows
- `_build_in_filter()` 字串拼接存在 SQL injection 風險
- query-tool 無 rate limit / cache可打爆 DB pool (pool_size=10, max_overflow=20)
- 兩份 service 各 1200-1300 行,血緣邏輯重複
既有安全基礎設施:
- `QueryBuilder``sql/builder.py``add_in_condition()` 支援 bind params `:p0, :p1, ...`
- `SQLLoader``sql/loader.py``load_with_params()` 支援結構參數 `{{ PARAM }}`
- `configured_rate_limit()``core/rate_limit.py`per-client rate limit with `Retry-After` header
- `LayeredCache``core/cache.py`L1 MemoryTTL + L2 Redis
## Goals / Non-Goals
**Goals:**
-`CONNECT BY NOCYCLE` 取代 Python BFS將 3-16 次 DB round-trip 縮減為 1 次
- 建立 `LineageEngine` 統一模組,消除血緣邏輯重複
- 消除 `_build_in_filter()` SQL injection 風險
- 為 query-tool 加入 rate limit + cache對齊 mid-section-defect
-`lot_split_merge_history` 加入 fast/full 雙模式
**Non-Goals:**
- 不新增 API endpoint由後續 `trace-progressive-ui` 負責)
- 不改動前端
- 不建立 materialized view / 不使用 PARALLEL hints
- 不改動其他頁面wip-detail, lot-detail 等)
## Decisions
### D1: CONNECT BY NOCYCLE 作為主要遞迴查詢策略
**選擇**: Oracle `CONNECT BY NOCYCLE` with `LEVEL <= 20`
**替代方案**: Recursive `WITH` (recursive subquery factoring)
**理由**:
- `CONNECT BY` 是 Oracle 原生遞迴語法,在 Oracle 19c 上執行計劃最佳化最成熟
- `LEVEL <= 20` 等價於現行 BFS `bfs_round > 20` 防護
- `NOCYCLE` 處理循環引用(`SPLITFROMID` 可能存在資料錯誤的循環)
- recursive `WITH` 作為 SQL 檔案內的註解替代方案,若 execution plan 不佳可快速切換
**SQL 設計**`sql/lineage/split_ancestors.sql`:
```sql
SELECT
c.CONTAINERID,
c.SPLITFROMID,
c.CONTAINERNAME,
LEVEL AS SPLIT_DEPTH
FROM DWH.DW_MES_CONTAINER c
START WITH {{ CID_FILTER }}
CONNECT BY NOCYCLE PRIOR c.SPLITFROMID = c.CONTAINERID
AND LEVEL <= 20
```
- `{{ CID_FILTER }}``QueryBuilder.get_conditions_sql()` 生成bind params 注入
- Oracle IN clause 上限透過 `ORACLE_IN_BATCH_SIZE=1000` 分批,多批結果合併
### D2: LineageEngine 模組結構
```
src/mes_dashboard/services/lineage_engine.py
├── resolve_split_ancestors(container_ids: List[str]) -> Dict
│ └── 回傳 {child_to_parent: {cid: parent_cid}, cid_to_name: {cid: name}}
├── resolve_merge_sources(container_names: List[str]) -> Dict
│ └── 回傳 {finished_name: [{source_cid, source_name}]}
└── resolve_full_genealogy(container_ids: List[str], initial_names: Dict) -> Dict
└── 組合 split + merge回傳 {cid: Set[ancestor_cids]}
src/mes_dashboard/sql/lineage/
├── split_ancestors.sql (CONNECT BY NOCYCLE)
└── merge_sources.sql (from merge_lookup.sql)
```
**函數簽名設計**:
- profile-agnostic接受 `container_ids: List[str]`,不綁定頁面邏輯
- 回傳原生 Python 資料結構dict/set不回傳 DataFrame
- 內部使用 `QueryBuilder` + `SQLLoader.load_with_params()` + `read_sql_df()`
- batch 邏輯封裝在模組內caller 不需處理 `ORACLE_IN_BATCH_SIZE`
### D3: EventFetcher 模組結構
```
src/mes_dashboard/services/event_fetcher.py
├── fetch_events(container_ids: List[str], domain: str) -> List[Dict]
│ └── 支援 domain: history, materials, rejects, holds, jobs, upstream_history
├── _cache_key(domain: str, container_ids: List[str]) -> str
│ └── 格式: evt:{domain}:{sorted_cids_hash}
└── _get_rate_limit_config(domain: str) -> Dict
└── 回傳 {bucket, max_attempts, window_seconds}
```
**快取策略**:
- L2 Redis cache對齊 `core/cache.py` 模式TTL 依 domain 配置
- cache key 使用 `hashlib.md5(sorted(cids).encode()).hexdigest()[:12]` 避免超長 key
- mid-section-defect 既有的 `_fetch_upstream_history()` 遷移到 `fetch_events(cids, "upstream_history")`
### D4: query-tool SQL injection 修復策略
**修復範圍**6 個呼叫點):
1. `_resolve_by_lot_id()` (line 262): `_build_in_filter(lot_ids, 'CONTAINERNAME')` + `read_sql_df(sql, {})`
2. `_resolve_by_serial_number()` (line ~320): 同上模式
3. `_resolve_by_work_order()` (line ~380): 同上模式
4. `get_lot_history()` 內部的 IN 子句
5. `get_lot_associations()` 內部的 IN 子句
6. `lot_split_merge_history` 查詢
**修復模式**(統一):
```python
# Before (unsafe)
in_filter = _build_in_filter(lot_ids, 'CONTAINERNAME')
sql = f"SELECT ... WHERE {in_filter}"
df = read_sql_df(sql, {})
# After (safe)
builder = QueryBuilder()
builder.add_in_condition("CONTAINERNAME", lot_ids)
sql = SQLLoader.load_with_params(
"query_tool/lot_resolve_id",
CONTAINER_FILTER=builder.get_conditions_sql(),
)
df = read_sql_df(sql, builder.params)
```
**`_build_in_filter()``_build_in_clause()` 完全刪除**(非 deprecated直接刪除因為這是安全漏洞
### D5: query-tool rate limit + cache 配置
**Rate limit**(對齊 `configured_rate_limit()` 模式):
| Endpoint | Bucket | Max/Window | Env Override |
|----------|--------|------------|-------------|
| `/resolve` | `query-tool-resolve` | 10/60s | `QT_RESOLVE_RATE_*` |
| `/lot-history` | `query-tool-history` | 20/60s | `QT_HISTORY_RATE_*` |
| `/lot-associations` | `query-tool-association` | 20/60s | `QT_ASSOC_RATE_*` |
| `/adjacent-lots` | `query-tool-adjacent` | 20/60s | `QT_ADJACENT_RATE_*` |
| `/equipment-period` | `query-tool-equipment` | 5/60s | `QT_EQUIP_RATE_*` |
| `/export-csv` | `query-tool-export` | 3/60s | `QT_EXPORT_RATE_*` |
**Cache**:
- resolve result: L2 Redis, TTL=60s, key=`qt:resolve:{input_type}:{values_hash}`
- 其他 GET endpoints: 暫不加 cache結果依賴動態 CONTAINERID 參數cache 命中率低)
### D6: lot_split_merge_history fast/full 雙模式
**Fast mode**(預設):
```sql
-- lot_split_merge_history.sql 加入條件
AND h.TXNDATE >= ADD_MONTHS(SYSDATE, -6)
...
FETCH FIRST 500 ROWS ONLY
```
**Full mode**`full_history=true`:
- SQL variant 不含時間窗和 row limit
- 使用 `read_sql_df_slow()` (120s timeout) 取代 `read_sql_df()` (55s timeout)
- Route 層透過 `request.args.get('full_history', 'false').lower() == 'true'` 判斷
### D7: 重構順序與 regression 防護
**Phase 1**: mid-section-defect較安全有 cache + distributed lock 保護)
1. 建立 `lineage_engine.py` + SQL files
2.`mid_section_defect_service.py` 中以 `LineageEngine` 取代 BFS 三函數
3. golden test 驗證 BFS vs CONNECT BY 結果一致
4. 廢棄 `genealogy_records.sql` + `split_chain.sql`(標記 deprecated
**Phase 2**: query-tool風險較高無既有保護
1. 修復所有 `_build_in_filter()``QueryBuilder`
2. 刪除 `_build_in_filter()` + `_build_in_clause()`
3. 加入 route-level rate limit
4. 加入 resolve cache
5. 加入 `lot_split_merge_history` fast/full mode
**Phase 3**: EventFetcher
1. 建立 `event_fetcher.py`
2. 遷移 `_fetch_upstream_history()``EventFetcher`
3. 遷移 query-tool event fetch paths → `EventFetcher`
## Risks / Trade-offs
| Risk | Mitigation |
|------|-----------|
| CONNECT BY 對超大血緣樹 (>10000 nodes) 可能產生不預期的 execution plan | `LEVEL <= 20` 硬上限 + SQL 檔案內含 recursive `WITH` 替代方案可快速切換 |
| golden test 覆蓋率不足導致 regression 漏網 | 選取 ≥5 個已知血緣結構的 LOT含多層 split + merge 交叉CI gate 強制通過 |
| `_build_in_filter()` 刪除後漏改呼叫點 | Phase 2 完成後 `grep -r "_build_in_filter\|_build_in_clause" src/` 必須 0 結果 |
| fast mode 6 個月時間窗可能截斷需要完整歷史的追溯 | 提供 `full_history=true` 切換完整模式,前端預設不加此參數 = fast mode |
| QueryBuilder `add_in_condition()` 對 >1000 值不自動分批 | LineageEngine 內部封裝分批邏輯(`for i in range(0, len(ids), 1000)`),呼叫者無感 |
## Migration Plan
1. **建立新模組**`lineage_engine.py`, `event_fetcher.py`, `sql/lineage/*.sql` — 無副作用,可安全部署
2. **Phase 1 切換**mid-section-defect 內部呼叫改用 `LineageEngine` — 有 cache/lock 保護regression 可透過 golden test + 手動比對驗證
3. **Phase 2 切換**query-tool 修復 + rate limit + cache — 需重新跑 query-tool 路由測試
4. **Phase 3 切換**EventFetcher 遷移 — 最後執行,影響範圍最小
5. **清理**:確認 deprecated SQL files 無引用後刪除
**Rollback**: 每個 Phase 獨立,可單獨 revert。`LineageEngine``EventFetcher` 為新模組,不影響既有程式碼直到各 Phase 的切換 commit。
## Open Questions
- `DW_MES_CONTAINER.SPLITFROMID` 欄位是否有 index若無`CONNECT BY``START WITH` 性能可能依賴全表掃描而非 CONTAINERID index。需確認 Oracle execution plan。
- `ORACLE_IN_BATCH_SIZE=1000``CONNECT BY START WITH ... IN (...)` 的行為是否與普通 `WHERE ... IN (...)` 一致?需在開發環境驗證。
- EventFetcher 的 cache TTL 各 domain 是否需要差異化(如 `upstream_history` 較長、`holds` 較短)?暫統一 300s後續視使用模式調整。

View File

@@ -0,0 +1,110 @@
## Why
批次追蹤工具 (`/query-tool`) 與中段製程不良追溯分析 (`/mid-section-defect`) 是本專案中查詢複雜度最高的兩個頁面。兩者都需要解析 LOT 血緣關係(拆批 split + 併批 merge但各自實作了獨立的追溯邏輯導致
1. **效能瓶頸**mid-section-defect 使用 Python 多輪 BFS 追溯 split chain`_bfs_split_chain()`,每次 3-16 次 DB round-trip加上 `genealogy_records.sql` 對 48M 行的 `HM_LOTMOVEOUT` 全表掃描30-120 秒)。
2. **安全風險**query-tool 的 `_build_in_filter()` 使用字串拼接建構 IN 子句(`query_tool_service.py:156-174``_resolve_by_lot_id()` / `_resolve_by_serial_number()` / `_resolve_by_work_order()` 系列函數傳入空 params `read_sql_df(sql, {})`——值直接嵌入 SQL 字串中,存在 SQL 注入風險。
3. **缺乏防護**query-tool 無 rate limit、無 cache高併發時可打爆 DB connection poolProduction pool_size=10, max_overflow=20
4. **重複程式碼**:兩個 service 各自維護 split chain 追溯、merge lookup、batch IN 分段等相同邏輯。
Oracle 19c 的 `CONNECT BY NOCYCLE` 可以用一條 SQL 取代整套 Python BFS將 3-16 次 DB round-trip 縮減為 1 次。備選方案為 Oracle 19c 支援的 recursive `WITH` (recursive subquery factoring)功能等價但可讀性更好。split/merge 的資料來源 (`DW_MES_CONTAINER.SPLITFROMID` + `DW_MES_PJ_COMBINEDASSYLOTS`) 完全不需碰 `HM_LOTMOVEOUT`,可消除 48M 行全表掃描。
**邊界聲明**:本變更為純後端內部重構,不新增任何 API endpoint不改動前端。既有 API contract 向下相容URL、request/response 格式不變),僅新增可選的 `full_history` query param 作為向下相容擴展。後續的前端分段載入和新增 API endpoints 列入獨立的 `trace-progressive-ui` 變更。
## What Changes
- 建立統一的 `LineageEngine` 模組(`src/mes_dashboard/services/lineage_engine.py`),提供 LOT 血緣解析共用核心:
- `resolve_split_ancestors()` — 使用 `CONNECT BY NOCYCLE` 單次 SQL 查詢取代 Python BFS備選: recursive `WITH`,於 SQL 檔案中以註解標註替代寫法)
- `resolve_merge_sources()` — 從 `DW_MES_PJ_COMBINEDASSYLOTS` 查詢併批來源
- `resolve_full_genealogy()` — 組合 split + merge 為完整血緣圖
- 設計為 profile-agnostic 的公用函數未來其他頁面wip-detail、lot-detail可直接呼叫但本變更僅接入 mid-section-defect 和 query-tool
- 建立統一的 `EventFetcher` 模組,提供帶 cache + rate limit 的批次事件查詢,封裝既有的 domain 查詢history、materials、rejects、holds、jobs、upstream_history
- 重構 `mid_section_defect_service.py`:以 `LineageEngine` 取代 `_bfs_split_chain()` + `_fetch_merge_sources()` + `_resolve_full_genealogy()`;以 `EventFetcher` 取代 `_fetch_upstream_history()`
- 重構 `query_tool_service.py`:以 `QueryBuilder` bind params 全面取代 `_build_in_filter()` 字串拼接;加入 route-level rate limit 和 cache 對齊 mid-section-defect 既有模式。
- 新增 SQL 檔案:
- `sql/lineage/split_ancestors.sql`CONNECT BY NOCYCLE 實作,檔案內包含 recursive WITH 替代寫法作為 Oracle 版本兼容備註)
- `sql/lineage/merge_sources.sql`(從 `sql/mid_section_defect/merge_lookup.sql` 遷移)
- 廢棄 SQL 檔案(標記 deprecated保留一個版本後刪除
- `sql/mid_section_defect/genealogy_records.sql`48M row HM_LOTMOVEOUT 全掃描不再需要)
- `sql/mid_section_defect/split_chain.sql`(由 lineage CONNECT BY 取代)
- 為 query-tool 的 `lot_split_merge_history.sql` 加入雙模式查詢:
- **fast mode**(預設):`TXNDATE >= ADD_MONTHS(SYSDATE, -6)` + `FETCH FIRST 500 ROWS ONLY`——涵蓋近半年追溯,回應 <5s
- **full mode**前端傳入 `full_history=true` 時不加時間窗保留完整歷史追溯能力 `read_sql_df_slow` (120s timeout)
- query-tool route 新增 `full_history` boolean query paramservice 依此選擇 SQL variant
## Capabilities
### New Capabilities
- `lineage-engine-core`: 統一 LOT 血緣解析引擎提供 `resolve_split_ancestors()`CONNECT BY NOCYCLE`LEVEL <= 20` 上限)、`resolve_merge_sources()``resolve_full_genealogy()` 三個公用函數全部使用 `QueryBuilder` bind params支援批次 IN 分段`ORACLE_IN_BATCH_SIZE=1000`)。函數簽名設計為 profile-agnostic接受 `container_ids: List[str]` 並回傳字典結構不綁定特定頁面邏輯
- `event-fetcher-unified`: 統一事件查詢層封裝 cache key 生成格式: `evt:{domain}:{sorted_cids_hash}`)、L1/L2 layered cache對齊 `core/cache.py` LayeredCache 模式)、rate limit bucket 配置對齊 `configured_rate_limit()` 模式)。domain 包含 `history``materials``rejects``holds``jobs``upstream_history`
- `query-tool-safety-hardening`: 修復 query-tool SQL 注入風險——`_build_in_filter()` `_build_in_clause()` 全面改用 `QueryBuilder.add_in_condition()`消除 `read_sql_df(sql, {})` params 模式加入 route-level rate limit對齊 `configured_rate_limit()` 模式resolve 10/min, history 20/min, association 20/min response cacheL2 Redis, 60s TTL)。
### Modified Capabilities
- `cache-indexed-query-acceleration`: mid-section-defect genealogy 查詢從 Python BFS 多輪 + HM_LOTMOVEOUT 全掃描改為 CONNECT BY 單輪 + 索引查詢
- `oracle-query-fragment-governance`: `_build_in_filter()` / `_build_in_clause()` 廢棄統一收斂到 `QueryBuilder.add_in_condition()`新增 `sql/lineage/` 目錄遵循既有 SQLLoader 慣例
## Impact
- **Affected code**:
- 新建: `src/mes_dashboard/services/lineage_engine.py`, `src/mes_dashboard/sql/lineage/split_ancestors.sql`, `src/mes_dashboard/sql/lineage/merge_sources.sql`
- 重構: `src/mes_dashboard/services/mid_section_defect_service.py` (1194L), `src/mes_dashboard/services/query_tool_service.py` (1329L), `src/mes_dashboard/routes/query_tool_routes.py`
- 廢棄: `src/mes_dashboard/sql/mid_section_defect/genealogy_records.sql`, `src/mes_dashboard/sql/mid_section_defect/split_chain.sql` ( lineage 模組取代標記 deprecated 保留一版)
- 修改: `src/mes_dashboard/sql/query_tool/lot_split_merge_history.sql` (加時間窗 + row limit)
- **Runtime/deploy**: 無新依賴仍為 Flask/Gunicorn + Oracle + RedisDB query pattern 改變但 connection pool 設定不變
- **APIs/pages**: `/query-tool` `/mid-section-defect` 既有 API contract 向下相容——URL輸入輸出格式HTTP status code 均不變純內部實作替換向下相容的擴展query-tool API 新增 rate limit header`Retry-After`對齊 `rate_limit.py` 既有實作query-tool split-merge history 新增可選 `full_history` query param預設 false = fast mode不傳時行為與舊版等價)。
- **Performance**: 見下方 Verification 章節的量化驗收基準
- **Security**: query-tool IN clause SQL injection 風險消除所有 `_build_in_filter()` / `_build_in_clause()` 呼叫點改為 `QueryBuilder.add_in_condition()`
- **Testing**: 需新增 LineageEngine 單元測試並建立 golden test 比對 BFS vs CONNECT BY 結果一致性既有 mid-section-defect query-tool 測試需更新 mock 路徑
## Verification
效能驗收基準——所有指標須在以下條件下量測
**測試資料規模**:
- LOT 血緣樹: 目標 seed lot 具備 3 split depth、≥50 ancestor nodes至少 1 merge path
- mid-section-defect: 使用 TMTT detection 產出 10 seed lots 的日期範圍查詢
- query-tool: resolve 結果 20 lots work order 查詢
**驗收指標**冷查詢 = cache miss熱查詢 = L2 Redis hit:
| 指標 | 現況 (P95) | 目標 (P95) | 條件 |
|------|-----------|-----------|------|
| mid-section-defect genealogy | 30-120s | 8s | CONNECT BY 單輪,≥50 ancestor nodes |
| mid-section-defect genealogy | 3-5s (L2 hit) | 1s | Redis cache hit |
| query-tool lot_split_merge_history fast mode | 無上限 >120s timeout | ≤5s | 時間窗 6 個月 + FETCH FIRST 500 ROWS |
| query-tool lot_split_merge_history full mode | 同上 | ≤60s | 無時間窗,走 `read_sql_df_slow` 120s timeout |
| LineageEngine.resolve_split_ancestors | N/A (新模組) | ≤3s | ≥50 ancestor nodes, CONNECT BY |
| DB connection 佔用時間 | 3-16 round-trips × 0.5-2s each | 單次 ≤3s | 單一 CONNECT BY 查詢 |
**安全驗收**:
- `_build_in_filter()``_build_in_clause()` 零引用grep 確認)
- 所有含使用者輸入的查詢resolve_by_lot_id, resolve_by_serial_number, resolve_by_work_order 等)必須使用 `QueryBuilder` bind params不可字串拼接。純靜態 SQL無使用者輸入允許空 params
**結果一致性驗收**:
- Golden test: 選取 ≥5 個已知血緣結構的 LOT比對 BFS vs CONNECT BY 輸出的 `child_to_parent``cid_to_name` 結果集合完全一致
## Non-Goals
- 前端 UI 改動不在此變更範圍內(前端分段載入和漸進式 UX 列入後續 `trace-progressive-ui` 變更)。
- 不新增任何 API endpoint——既有 API contract 向下相容(僅新增可選 query param `full_history` 作為擴展)。新增 endpoint 由後續 `trace-progressive-ui` 負責。
- 不改動 DB schema、不建立 materialized view、不使用 PARALLEL hints——所有最佳化在應用層SQL 改寫 + Python 重構 + Redis cache完成。
- 不改動其他頁面wip-detail、lot-detail 等)的查詢邏輯——`LineageEngine` 設計為可擴展,但本變更僅接入兩個目標頁面。
- 不使用 Oracle PARALLEL hints在 connection pool 環境下行為不可預測,不做為最佳化手段)。
## Dependencies
- 無前置依賴。本變更可獨立實施。
- 後續 `trace-progressive-ui` 依賴本變更完成後的 `LineageEngine``EventFetcher` 模組。
## Risks
| 風險 | 緩解 |
|------|------|
| CONNECT BY 遇超大血緣樹(>10000 ancestors效能退化 | `LEVEL <= 20` 上限 + `NOCYCLE` 防循環;與目前 BFS `bfs_round > 20` 等效。若 Oracle 19c 執行計劃不佳SQL 檔案內含 recursive `WITH` 替代寫法可快速切換 |
| 血緣結果與 BFS 版本不一致regression | 建立 golden test用 ≥5 個已知 LOT 比對 BFS vs CONNECT BY 輸出CI gate 確保結果集合完全一致 |
| 重構範圍橫跨兩個大 service2500+ 行) | 分階段:先重構 mid-section-defect有 cache+lock 保護regression 風險較低),再做 query-tool |
| `genealogy_records.sql` 廢棄後遺漏引用 | grep 全域搜索確認無其他引用點SQL file 標記 deprecated 保留一個版本後刪除 |
| query-tool 新增 rate limit 影響使用者體驗 | 預設值寬鬆resolve 10/min, history 20/min與 mid-section-defect 既有 rate limit 對齊,回應包含 `Retry-After` header |
| `QueryBuilder` 取代 `_build_in_filter()` 時漏改呼叫點 | grep 搜索 `_build_in_filter``_build_in_clause` 所有引用,逐一替換並確認 0 殘留引用 |

View File

@@ -0,0 +1,18 @@
## ADDED Requirements
### Requirement: Mid-section defect genealogy SHALL use CONNECT BY instead of Python BFS
The mid-section-defect genealogy resolution SHALL use `LineageEngine.resolve_full_genealogy()` (CONNECT BY NOCYCLE) instead of the existing `_bfs_split_chain()` Python BFS implementation.
#### Scenario: Genealogy cold query performance
- **WHEN** mid-section-defect analysis executes genealogy resolution with cache miss
- **THEN** `LineageEngine.resolve_split_ancestors()` SHALL be called (single CONNECT BY query)
- **THEN** response time SHALL be ≤8s (P95) for ≥50 ancestor nodes
- **THEN** Python BFS `_bfs_split_chain()` SHALL NOT be called
#### Scenario: Genealogy hot query performance
- **WHEN** mid-section-defect analysis executes genealogy resolution with L2 Redis cache hit
- **THEN** response time SHALL be ≤1s (P95)
#### Scenario: Golden test result equivalence
- **WHEN** golden test runs with ≥5 known LOTs
- **THEN** CONNECT BY output (`child_to_parent`, `cid_to_name`) SHALL be identical to BFS output for the same inputs

View File

@@ -0,0 +1,20 @@
## ADDED Requirements
### Requirement: EventFetcher SHALL provide unified cached event querying across domains
`EventFetcher` SHALL encapsulate batch event queries with L1/L2 layered cache and rate limit bucket configuration, supporting domains: `history`, `materials`, `rejects`, `holds`, `jobs`, `upstream_history`.
#### Scenario: Cache miss for event domain query
- **WHEN** `EventFetcher` is called for a domain with container IDs and no cache exists
- **THEN** the domain query SHALL execute against Oracle via `read_sql_df()`
- **THEN** the result SHALL be stored in L2 Redis cache with key format `evt:{domain}:{sorted_cids_hash}`
- **THEN** L1 memory cache SHALL also be populated (aligned with `core/cache.py` LayeredCache pattern)
#### Scenario: Cache hit for event domain query
- **WHEN** `EventFetcher` is called for a domain and L2 Redis cache contains a valid entry
- **THEN** the cached result SHALL be returned without executing Oracle query
- **THEN** DB connection pool SHALL NOT be consumed
#### Scenario: Rate limit bucket per domain
- **WHEN** `EventFetcher` is used from a route handler
- **THEN** each domain SHALL have a configurable rate limit bucket aligned with `configured_rate_limit()` pattern
- **THEN** rate limit configuration SHALL be overridable via environment variables

View File

@@ -0,0 +1,57 @@
## ADDED Requirements
### Requirement: LineageEngine SHALL provide unified split ancestor resolution via CONNECT BY NOCYCLE
`LineageEngine.resolve_split_ancestors()` SHALL accept a list of container IDs and return the complete split ancestry graph using a single Oracle `CONNECT BY NOCYCLE` query on `DW_MES_CONTAINER.SPLITFROMID`.
#### Scenario: Normal split chain resolution
- **WHEN** `resolve_split_ancestors()` is called with a list of container IDs
- **THEN** a single SQL query using `CONNECT BY NOCYCLE` SHALL be executed against `DW_MES_CONTAINER`
- **THEN** the result SHALL include a `child_to_parent` mapping and a `cid_to_name` mapping for all discovered ancestor nodes
- **THEN** the traversal depth SHALL be limited to `LEVEL <= 20` (equivalent to existing BFS `bfs_round > 20` guard)
#### Scenario: Large input batch exceeding Oracle IN clause limit
- **WHEN** the input `container_ids` list exceeds `ORACLE_IN_BATCH_SIZE` (1000)
- **THEN** `QueryBuilder.add_in_condition()` SHALL batch the IDs and combine results
- **THEN** all bind parameters SHALL use `QueryBuilder.params` (no string concatenation)
#### Scenario: Cyclic split references in data
- **WHEN** `DW_MES_CONTAINER.SPLITFROMID` contains cyclic references
- **THEN** `NOCYCLE` SHALL prevent infinite traversal
- **THEN** the query SHALL return all non-cyclic ancestors up to `LEVEL <= 20`
#### Scenario: CONNECT BY performance regression
- **WHEN** Oracle 19c execution plan for `CONNECT BY NOCYCLE` performs worse than expected
- **THEN** the SQL file SHALL contain a commented-out recursive `WITH` (recursive subquery factoring) alternative that can be swapped in without code changes
### Requirement: LineageEngine SHALL provide unified merge source resolution
`LineageEngine.resolve_merge_sources()` SHALL accept a list of container IDs and return merge source mappings from `DW_MES_PJ_COMBINEDASSYLOTS`.
#### Scenario: Merge source lookup
- **WHEN** `resolve_merge_sources()` is called with container IDs
- **THEN** the result SHALL include `{cid: [merge_source_cid, ...]}` for all containers that have merge sources
- **THEN** all queries SHALL use `QueryBuilder` bind params
### Requirement: LineageEngine SHALL provide combined genealogy resolution
`LineageEngine.resolve_full_genealogy()` SHALL combine split ancestors and merge sources into a complete genealogy graph.
#### Scenario: Full genealogy for a set of seed lots
- **WHEN** `resolve_full_genealogy()` is called with seed container IDs
- **THEN** split ancestors SHALL be resolved first via `resolve_split_ancestors()`
- **THEN** merge sources SHALL be resolved for all discovered ancestor nodes
- **THEN** the combined result SHALL be equivalent to the existing `_resolve_full_genealogy()` output in `mid_section_defect_service.py`
### Requirement: LineageEngine functions SHALL be profile-agnostic
All `LineageEngine` public functions SHALL accept `container_ids: List[str]` and return dictionary structures without binding to any specific page logic.
#### Scenario: Reuse from different pages
- **WHEN** a new page (e.g., wip-detail) needs lineage resolution
- **THEN** it SHALL be able to call `LineageEngine` functions directly without modification
- **THEN** no page-specific logic (profile, TMTT detection, etc.) SHALL exist in `LineageEngine`
### Requirement: LineageEngine SQL files SHALL reside in `sql/lineage/` directory
New SQL files SHALL follow the existing `SQLLoader` convention under `src/mes_dashboard/sql/lineage/`.
#### Scenario: SQL file organization
- **WHEN** `LineageEngine` executes queries
- **THEN** `split_ancestors.sql` and `merge_sources.sql` SHALL be loaded via `SQLLoader.load_with_params("lineage/split_ancestors", ...)`
- **THEN** the SQL files SHALL NOT reference `HM_LOTMOVEOUT` (48M row table no longer needed for genealogy)

View File

@@ -0,0 +1,23 @@
## ADDED Requirements
### Requirement: Lineage SQL fragments SHALL be centralized in `sql/lineage/` directory
Split ancestor and merge source SQL queries SHALL be defined in `sql/lineage/` and shared across services via `SQLLoader`.
#### Scenario: Mid-section-defect lineage query
- **WHEN** `mid_section_defect_service.py` needs split ancestry or merge source data
- **THEN** it SHALL call `LineageEngine` which loads SQL from `sql/lineage/split_ancestors.sql` and `sql/lineage/merge_sources.sql`
- **THEN** it SHALL NOT use `sql/mid_section_defect/split_chain.sql` or `sql/mid_section_defect/genealogy_records.sql`
#### Scenario: Deprecated SQL file handling
- **WHEN** `sql/mid_section_defect/genealogy_records.sql` and `sql/mid_section_defect/split_chain.sql` are deprecated
- **THEN** the files SHALL be marked with a deprecated comment at the top
- **THEN** grep SHALL confirm zero `SQLLoader.load` references to these files
- **THEN** the files SHALL be retained for one version before deletion
### Requirement: All user-input SQL queries SHALL use QueryBuilder bind params
`_build_in_filter()` and `_build_in_clause()` in `query_tool_service.py` SHALL be fully replaced by `QueryBuilder.add_in_condition()`.
#### Scenario: Complete migration to QueryBuilder
- **WHEN** the refactoring is complete
- **THEN** grep for `_build_in_filter` and `_build_in_clause` SHALL return zero results
- **THEN** all queries involving user-supplied values SHALL use `QueryBuilder.params`

View File

@@ -0,0 +1,57 @@
## ADDED Requirements
### Requirement: query-tool resolve functions SHALL use QueryBuilder bind params for all user input
All `resolve_lots()` family functions (`_resolve_by_lot_id`, `_resolve_by_serial_number`, `_resolve_by_work_order`) SHALL use `QueryBuilder.add_in_condition()` with bind parameters instead of `_build_in_filter()` string concatenation.
#### Scenario: Lot resolve with user-supplied values
- **WHEN** a resolve function receives user-supplied lot IDs, serial numbers, or work order names
- **THEN** the SQL query SHALL use `:p0, :p1, ...` bind parameters via `QueryBuilder`
- **THEN** `read_sql_df()` SHALL receive `builder.params` (never an empty `{}` dict for queries with user input)
- **THEN** `_build_in_filter()` and `_build_in_clause()` SHALL NOT be called
#### Scenario: Pure static SQL without user input
- **WHEN** a query contains no user-supplied values (e.g., static lookups)
- **THEN** empty params `{}` is acceptable
- **THEN** no `_build_in_filter()` SHALL be used
#### Scenario: Zero residual references to deprecated functions
- **WHEN** the refactoring is complete
- **THEN** grep for `_build_in_filter` and `_build_in_clause` SHALL return zero results across the entire codebase
### Requirement: query-tool routes SHALL apply rate limiting
All query-tool API endpoints SHALL apply per-client rate limiting using the existing `configured_rate_limit` mechanism.
#### Scenario: Resolve endpoint rate limit exceeded
- **WHEN** a client sends more than 10 requests to query-tool resolve endpoints within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
- **THEN** the resolve service function SHALL NOT be called
#### Scenario: History endpoint rate limit exceeded
- **WHEN** a client sends more than 20 requests to query-tool history endpoints within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
#### Scenario: Association endpoint rate limit exceeded
- **WHEN** a client sends more than 20 requests to query-tool association endpoints within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
### Requirement: query-tool routes SHALL apply response caching
High-cost query-tool endpoints SHALL cache responses in L2 Redis.
#### Scenario: Resolve result caching
- **WHEN** a resolve request succeeds
- **THEN** the response SHALL be cached in L2 Redis with TTL = 60s
- **THEN** subsequent identical requests within TTL SHALL return cached result without Oracle query
### Requirement: lot_split_merge_history SHALL support fast and full query modes
The `lot_split_merge_history.sql` query SHALL support two modes to balance traceability completeness vs performance.
#### Scenario: Fast mode (default)
- **WHEN** `full_history` query parameter is absent or `false`
- **THEN** the SQL SHALL include `TXNDATE >= ADD_MONTHS(SYSDATE, -6)` time window and `FETCH FIRST 500 ROWS ONLY`
- **THEN** query response time SHALL be ≤5s (P95)
#### Scenario: Full mode
- **WHEN** `full_history=true` query parameter is provided
- **THEN** the SQL SHALL NOT include time window restriction
- **THEN** the query SHALL use `read_sql_df_slow` (120s timeout)
- **THEN** query response time SHALL be ≤60s (P95)

View File

@@ -0,0 +1,57 @@
## Phase 1: LineageEngine 模組建立
- [x] 1.1 建立 `src/mes_dashboard/sql/lineage/split_ancestors.sql`CONNECT BY NOCYCLE含 recursive WITH 註解替代方案)
- [x] 1.2 建立 `src/mes_dashboard/sql/lineage/merge_sources.sql`(從 `mid_section_defect/merge_lookup.sql` 遷移,改用 `{{ FINISHED_NAME_FILTER }}` 結構參數)
- [x] 1.3 建立 `src/mes_dashboard/services/lineage_engine.py``resolve_split_ancestors()``resolve_merge_sources()``resolve_full_genealogy()` 三個公用函數,使用 `QueryBuilder` bind params + `ORACLE_IN_BATCH_SIZE=1000` 分批
- [x] 1.4 LineageEngine 單元測試mock `read_sql_df` 驗證 batch 分割、dict 回傳結構、LEVEL <= 20 防護
## Phase 2: mid-section-defect 切換到 LineageEngine
- [x] 2.1 在 `mid_section_defect_service.py` 中以 `LineageEngine.resolve_split_ancestors()` 取代 `_bfs_split_chain()`
- [x] 2.2 以 `LineageEngine.resolve_merge_sources()` 取代 `_fetch_merge_sources()`
- [x] 2.3 以 `LineageEngine.resolve_full_genealogy()` 取代 `_resolve_full_genealogy()`
- [x] 2.4 Golden test選取 ≥5 個已知血緣結構 LOT比對 BFS vs CONNECT BY 輸出的 `child_to_parent``cid_to_name` 結果集合完全一致
- [x] 2.5 標記 `sql/mid_section_defect/genealogy_records.sql``sql/mid_section_defect/split_chain.sql` 為 deprecated檔案頂部加 `-- DEPRECATED: replaced by sql/lineage/split_ancestors.sql`
## Phase 3: query-tool SQL injection 修復
- [x] 3.1 建立 `sql/query_tool/lot_resolve_id.sql``lot_resolve_serial.sql``lot_resolve_work_order.sql` SQL 檔案(從 inline SQL 遷移到 SQLLoader 管理)
- [x] 3.2 修復 `_resolve_by_lot_id()``_build_in_filter()``QueryBuilder.add_in_condition()` + `SQLLoader.load_with_params()` + `read_sql_df(sql, builder.params)`
- [x] 3.3 修復 `_resolve_by_serial_number()`:同上模式
- [x] 3.4 修復 `_resolve_by_work_order()`:同上模式
- [x] 3.5 修復 `get_lot_history()` 內部 IN 子句:改用 `QueryBuilder`
- [x] 3.6 修復 lot-associations 查詢路徑(`get_lot_materials()` / `get_lot_rejects()` / `get_lot_holds()` / `get_lot_splits()` / `get_lot_jobs()`)中涉及使用者輸入的 IN 子句:改用 `QueryBuilder`
- [x] 3.7 修復 `lot_split_merge_history` 查詢:改用 `QueryBuilder`
- [x] 3.8 刪除 `_build_in_filter()``_build_in_clause()` 函數
- [x] 3.9 驗證:`grep -r "_build_in_filter\|_build_in_clause" src/` 回傳 0 結果
- [x] 3.10 更新既有 query-tool 路由測試的 mock 路徑
## Phase 4: query-tool rate limit + cache
- [x] 4.1 在 `query_tool_routes.py``/resolve` 加入 `configured_rate_limit(bucket='query-tool-resolve', default_max_attempts=10, default_window_seconds=60)`
- [x] 4.2 為 `/lot-history` 加入 `configured_rate_limit(bucket='query-tool-history', default_max_attempts=20, default_window_seconds=60)`
- [x] 4.3 為 `/lot-associations` 加入 `configured_rate_limit(bucket='query-tool-association', default_max_attempts=20, default_window_seconds=60)`
- [x] 4.4 為 `/adjacent-lots` 加入 `configured_rate_limit(bucket='query-tool-adjacent', default_max_attempts=20, default_window_seconds=60)`
- [x] 4.5 為 `/equipment-period` 加入 `configured_rate_limit(bucket='query-tool-equipment', default_max_attempts=5, default_window_seconds=60)`
- [x] 4.6 為 `/export-csv` 加入 `configured_rate_limit(bucket='query-tool-export', default_max_attempts=3, default_window_seconds=60)`
- [x] 4.7 為 resolve 結果加入 L2 Redis cachekey=`qt:resolve:{input_type}:{values_hash}`, TTL=60s
## Phase 5: lot_split_merge_history fast/full 雙模式
- [x] 5.1 修改 `sql/query_tool/lot_split_merge_history.sql`:加入 `{{ TIME_WINDOW }}``{{ ROW_LIMIT }}` 結構參數
- [x] 5.2 在 `query_tool_service.py` 中根據 `full_history` 參數選擇 SQL variantfast: `AND h.TXNDATE >= ADD_MONTHS(SYSDATE, -6)` + `FETCH FIRST 500 ROWS ONLY`full: 無限制 + `read_sql_df_slow`
- [x] 5.3 在 `query_tool_routes.py``/api/query-tool/lot-associations?type=splits` 路徑加入 `full_history` query param 解析,並傳遞到 split-merge-history 查詢
- [x] 5.4 路由測試:驗證 fast mode預設和 full mode`full_history=true`)的行為差異
## Phase 6: EventFetcher 模組建立
- [x] 6.1 建立 `src/mes_dashboard/services/event_fetcher.py``fetch_events(container_ids, domain)` + cache key 生成 + rate limit config
- [x] 6.2 遷移 `mid_section_defect_service.py``_fetch_upstream_history()``EventFetcher.fetch_events(cids, "upstream_history")`
- [x] 6.3 遷移 query-tool event fetch paths 到 `EventFetcher``get_lot_history``get_lot_associations` 的 DB 查詢部分)
- [x] 6.4 EventFetcher 單元測試mock DB 驗證 cache key 格式、rate limit config、domain 分支
## Phase 7: 清理與驗證
- [x] 7.1 確認 `genealogy_records.sql``split_chain.sql` 無活躍引用(`grep -r` 確認),保留 deprecated 標記
- [x] 7.2 確認所有含使用者輸入的查詢使用 `QueryBuilder` bind paramsgrep `read_sql_df` 呼叫點逐一確認)
- [x] 7.3 執行完整 query-tool 和 mid-section-defect 路由測試,確認無 regression

View File

@@ -0,0 +1,2 @@
schema: spec-driven
status: proposal

View File

@@ -0,0 +1,446 @@
## Context
`unified-lineage-engine` 完成後,後端追溯管線從 30-120 秒降至 3-8 秒。但目前的 UX 模式仍是黑盒等待——mid-section-defect 的 `/analysis` GET 一次回傳全部結果KPI + charts + trend + genealogy_statusquery-tool 雖有手動順序resolve → history → association但 lineage 查詢仍在批次載入。
既有前端架構:
- mid-section-defect: `App.vue``Promise.all([apiGet('/analysis'), loadDetail(1)])` 並行呼叫,`loading.querying` 單一布林控制整頁 loading state
- query-tool: `useQueryToolData.js` composable 管理 `loading.resolving / .history / .association / .equipment`,各自獨立但無分段進度
- 共用: `useAutoRefresh` (jittered interval + abort signal), `usePaginationState`, `apiGet/apiPost` (timeout + abort), `useQueryState` (URL sync)
- API 模式: `apiGet/apiPost` 支援 `signal: AbortSignal` + `timeout`,錯誤物件含 `error.retryAfterSeconds`
## Goals / Non-Goals
**Goals:**
- 新增 `/api/trace/*` 三段式 APIseed-resolve → lineage → events通過 `profile` 參數區分頁面行為
- 建立 `useTraceProgress` composable 封裝三段式 sequential fetch + reactive state
- mid-section-defect 漸進渲染: seed lots 先出 → 血緣 → KPI/charts fade-in
- query-tool lineage tab 改為 on-demand點擊單一 lot 後才查血緣)
- 保持 `/api/mid-section-defect/analysis` GET endpoint 向下相容
- 刪除 pre-Vite dead code `static/js/query-tool.js`
**Non-Goals:**
- 不實作 SSE / WebSocketgunicorn sync workers 限制)
- 不新增 Celery/RQ 任務隊列
- 不改動追溯計算邏輯(由 `unified-lineage-engine` 負責)
- 不改動 defect attribution 演算法
- 不改動 equipment-period 查詢
## Decisions
### D1: trace_routes.py Blueprint 架構
**選擇**: 單一 Blueprint `trace_bp`,三個 route handler + profile dispatch
**替代方案**: 每個 profile 獨立 Blueprint`trace_msd_bp`, `trace_qt_bp`
**理由**:
- 三個 endpoint 的 request/response 結構統一,僅內部呼叫邏輯依 profile 分支
- 獨立 Blueprint 會重複 rate limit / cache / error handling boilerplate
- profile 驗證集中在一處(`_validate_profile()`),新增 profile 只需加 if 分支
**路由設計**:
```python
trace_bp = Blueprint('trace', __name__, url_prefix='/api/trace')
@trace_bp.route('/seed-resolve', methods=['POST'])
@configured_rate_limit(bucket='trace-seed', default_max_attempts=10, default_window_seconds=60)
def seed_resolve():
body = request.get_json()
profile = body.get('profile')
params = body.get('params', {})
# profile dispatch → _seed_resolve_query_tool(params) or _seed_resolve_msd(params)
# return jsonify({ "stage": "seed-resolve", "seeds": [...], "seed_count": N, "cache_key": "trace:{hash}" })
@trace_bp.route('/lineage', methods=['POST'])
@configured_rate_limit(bucket='trace-lineage', default_max_attempts=10, default_window_seconds=60)
def lineage():
body = request.get_json()
container_ids = body.get('container_ids', [])
# call LineageEngine.resolve_full_genealogy(container_ids)
# return jsonify({ "stage": "lineage", "ancestors": {...}, "merges": {...}, "total_nodes": N })
@trace_bp.route('/events', methods=['POST'])
@configured_rate_limit(bucket='trace-events', default_max_attempts=15, default_window_seconds=60)
def events():
body = request.get_json()
container_ids = body.get('container_ids', [])
domains = body.get('domains', [])
profile = body.get('profile')
# call EventFetcher for each domain
# if profile == 'mid_section_defect': run aggregation
# return jsonify({ "stage": "events", "results": {...}, "aggregation": {...} | null })
```
**Profile dispatch 內部函數**:
```
_seed_resolve_query_tool(params) → 呼叫 query_tool_service 既有 resolve 邏輯
_seed_resolve_msd(params) → 呼叫 mid_section_defect_service TMTT 偵測邏輯
_aggregate_msd(events_data) → mid-section-defect 專屬 aggregation (KPI, charts, trend)
```
**Cache 策略**:
- seed-resolve: `trace:seed:{profile}:{params_hash}`, TTL=300s
- lineage: `trace:lineage:{sorted_cids_hash}`, TTL=300sprofile-agnostic因為 lineage 不依賴 profile
- events: `trace:evt:{profile}:{domains_hash}:{sorted_cids_hash}`, TTL=300s
- 使用 `LayeredCache` L2 Redis對齊 `core/cache.py` 既有模式)
- cache key hash: `hashlib.md5(sorted(values).encode()).hexdigest()[:12]`
**錯誤處理統一模式**:
```python
def _make_stage_error(stage, code, message, status=400):
return jsonify({"error": message, "code": code}), status
# Timeout 處理: 每個 stage 內部用 read_sql_df() 的 55s call_timeout
# 若超時: return _make_stage_error(stage, f"{STAGE}_TIMEOUT", "...", 504)
```
### D2: useTraceProgress composable 設計
**選擇**: 新建 `frontend/src/shared-composables/useTraceProgress.js`,封裝 sequential fetch + reactive stage state
**替代方案**: 直接在各頁面 App.vue 內實作分段 fetch
**理由**:
- 兩個頁面共用相同的三段式 fetch 邏輯
- 將 stage 狀態管理抽離,頁面只需關注渲染邏輯
- 對齊既有 `shared-composables/` 目錄結構
**Composable 簽名**:
```javascript
export function useTraceProgress({ profile, buildParams }) {
// --- Reactive State ---
const current_stage = ref(null) // 'seed-resolve' | 'lineage' | 'events' | null
const completed_stages = ref([]) // ['seed-resolve', 'lineage']
const stage_results = reactive({
seed: null, // { seeds: [], seed_count: N, cache_key: '...' }
lineage: null, // { ancestors: {...}, merges: {...}, total_nodes: N }
events: null, // { results: {...}, aggregation: {...} }
})
const stage_errors = reactive({
seed: null, // { code: '...', message: '...' }
lineage: null,
events: null,
})
const is_running = ref(false)
// --- Methods ---
async function execute(params) // 執行三段式 fetch
function reset() // 清空所有 state
function abort() // 中止進行中的 fetch
return {
current_stage,
completed_stages,
stage_results,
stage_errors,
is_running,
execute,
reset,
abort,
}
}
```
**Sequential fetch 邏輯**:
```javascript
async function execute(params) {
reset()
is_running.value = true
const abortCtrl = new AbortController()
try {
// Stage 1: seed-resolve
current_stage.value = 'seed-resolve'
const seedResult = await apiPost('/api/trace/seed-resolve', {
profile,
params,
}, { timeout: 60000, signal: abortCtrl.signal })
stage_results.seed = seedResult.data
completed_stages.value.push('seed-resolve')
if (!seedResult.data?.seeds?.length) return // 無 seed不繼續
// Stage 2: lineage
current_stage.value = 'lineage'
const cids = seedResult.data.seeds.map(s => s.container_id)
const lineageResult = await apiPost('/api/trace/lineage', {
profile,
container_ids: cids,
cache_key: seedResult.data.cache_key,
}, { timeout: 60000, signal: abortCtrl.signal })
stage_results.lineage = lineageResult.data
completed_stages.value.push('lineage')
// Stage 3: events
current_stage.value = 'events'
const allCids = _collectAllCids(cids, lineageResult.data)
const eventsResult = await apiPost('/api/trace/events', {
profile,
container_ids: allCids,
domains: _getDomainsForProfile(profile),
cache_key: seedResult.data.cache_key,
}, { timeout: 60000, signal: abortCtrl.signal })
stage_results.events = eventsResult.data
completed_stages.value.push('events')
} catch (err) {
if (err?.name === 'AbortError') return
// 記錄到當前 stage 的 error state
const stage = current_stage.value
if (stage) stage_errors[_stageKey(stage)] = { code: err.errorCode, message: err.message }
} finally {
current_stage.value = null
is_running.value = false
}
}
```
**設計重點**:
- `stage_results` 為 reactive object每個 stage 完成後立即賦值,觸發依賴該 stage 的 UI 更新
- 錯誤不拋出到頁面——記錄在 `stage_errors` 中,已完成的 stage 結果保留
- `abort()` 方法供 `useAutoRefresh` 在新一輪 refresh 前中止上一輪
- `profile` 為建構時注入(不可變),`params` 為執行時傳入(每次查詢可變)
- `cache_key` 在 stage 間傳遞,用於 logging correlation
### D3: mid-section-defect 漸進渲染策略
**選擇**: 分段渲染 + skeleton placeholders + CSS fade-in transition
**替代方案**: 保持一次性渲染(等全部 stage 完成)
**理由**:
- seed stage ≤3s 可先顯示 seed lots 數量和基本資訊
- lineage + events 完成後再填入 KPI/charts使用者感受到進度
- skeleton placeholders 避免 layout shiftchart container 預留固定高度)
**App.vue 查詢流程改造**:
```javascript
// Before (current)
async function loadAnalysis() {
loading.querying = true
const [summaryResult] = await Promise.all([
apiGet('/api/mid-section-defect/analysis', { params, timeout: 120000, signal }),
loadDetail(1, signal),
])
analysisData.value = summaryResult.data // 一次全部更新
loading.querying = false
}
// After (progressive)
const trace = useTraceProgress({ profile: 'mid_section_defect' })
async function loadAnalysis() {
const params = buildFilterParams()
// 分段 fetchseed → lineage → events+aggregation
await trace.execute(params)
// Detail 仍用舊 endpoint 分頁(不走分段 API
await loadDetail(1)
}
```
**渲染層對應**:
```
trace.completed_stages 包含 'seed-resolve'
→ 顯示 seed lots 數量 badge + 基本 filter feedback
→ KPI cards / charts / trend 顯示 skeleton
trace.completed_stages 包含 'lineage'
→ 顯示 genealogy_statusancestor 數量)
→ KPI/charts 仍為 skeleton
trace.completed_stages 包含 'events'
→ trace.stage_results.events.aggregation 不為 null
→ KPI cards 以 fade-in 填入數值
→ Pareto charts 以 fade-in 渲染
→ Trend chart 以 fade-in 渲染
```
**Skeleton Placeholder 規格**:
- KpiCards: 6 個固定高度 card frame`min-height: 100px`),灰色脈動動畫
- ParetoChart: 6 個固定高度 chart frame`min-height: 300px`),灰色脈動動畫
- TrendChart: 1 個固定高度 frame`min-height: 300px`
- fade-in: CSS transition `opacity 0→1, 300ms ease-in`
**Auto-refresh 整合**:
- `useAutoRefresh.onRefresh``trace.abort()` + `trace.execute(committedFilters)`
- 保持現行 5 分鐘 jittered interval
**Detail 分頁不變**:
- `/api/mid-section-defect/analysis/detail` GET endpoint 保持不變
- 不走分段 APIdetail 是分頁查詢,與 trace pipeline 獨立)
### D4: query-tool on-demand lineage 策略
**選擇**: per-lot on-demand fetch使用者點擊 lot card 才查血緣
**替代方案**: batch-load all lots lineage at resolve time
**理由**:
- resolve 結果可能有 20+ lots批次查全部 lineage 增加不必要的 DB 負擔
- 大部分使用者只關注特定幾個 lot 的 lineage
- per-lot fetch 控制在 ≤3s使用者體驗可接受
**useQueryToolData.js 改造**:
```javascript
// 新增 lineage state
const lineageCache = reactive({}) // { [containerId]: { ancestors, merges, loading, error } }
async function loadLotLineage(containerId) {
if (lineageCache[containerId]?.ancestors) return // 已快取
lineageCache[containerId] = { ancestors: null, merges: null, loading: true, error: null }
try {
const result = await apiPost('/api/trace/lineage', {
profile: 'query_tool',
container_ids: [containerId],
}, { timeout: 60000 })
lineageCache[containerId] = {
ancestors: result.data.ancestors,
merges: result.data.merges,
loading: false,
error: null,
}
} catch (err) {
lineageCache[containerId] = {
ancestors: null,
merges: null,
loading: false,
error: err.message,
}
}
}
```
**UI 行為**:
- lot 列表中每個 lot 有展開按鈕(或 accordion
- 點擊展開 → 呼叫 `loadLotLineage(containerId)` → 顯示 loading → 顯示 lineage tree
- 已展開的 lot 再次點擊收合(不重新 fetch
- `lineageCache` 在新一輪 `resolveLots()` 時清空
**query-tool 主流程保持不變**:
- resolve → lot-history → lot-associations 的既有流程不改
- lineage 是新增的 on-demand 功能,不取代既有功能
- query-tool 暫不使用 `useTraceProgress`(因為它的流程是使用者驅動的互動式,非自動 sequential
### D5: 進度指示器元件設計
**選擇**: 共用 `TraceProgressBar.vue` 元件props 驅動
**替代方案**: 各頁面各自實作進度顯示
**理由**:
- 兩個頁面顯示相同的 stage 進度seed → lineage → events
- 統一視覺語言
**元件設計**:
```javascript
// frontend/src/shared-composables/TraceProgressBar.vue
// (放在 shared-composables 目錄,雖然是 .vue 但與 composable 搭配使用)
props: {
current_stage: String | null, // 'seed-resolve' | 'lineage' | 'events'
completed_stages: Array, // ['seed-resolve', 'lineage']
stage_errors: Object, // { seed: null, lineage: { code, message } }
}
// 三個 step indicator:
// [●] Seed → [●] Lineage → [○] Events
// ↑ 完成(green) ↑ 進行中(blue pulse) ↑ 待處理(gray)
// ↑ 錯誤(red)
```
**Stage 顯示名稱**:
| Stage ID | 中文顯示 | 英文顯示 |
|----------|---------|---------|
| seed-resolve | 批次解析 | Resolving |
| lineage | 血緣追溯 | Lineage |
| events | 事件查詢 | Events |
**取代 loading spinner**:
- mid-section-defect: `loading.querying` 原本控制單一 spinner → 改為顯示 `TraceProgressBar`
- 進度指示器放在 filter bar 下方、結果區域上方
### D6: `/analysis` GET endpoint 向下相容橋接
**選擇**: 保留原 handler內部改為呼叫分段管線後合併結果
**替代方案**: 直接改原 handler 不經過分段管線
**理由**:
- 分段管線LineageEngine + EventFetcher`unified-lineage-engine` 完成後已是標準路徑
- 保留原 handler 確保非 portal-shell 路由 fallback 仍可用
- golden test 比對確保結果等價
**橋接邏輯**:
```python
# mid_section_defect_routes.py — /analysis handler 內部改造
@mid_section_defect_bp.route('/analysis', methods=['GET'])
@configured_rate_limit(bucket='msd-analysis', ...)
def api_analysis():
# 現行: result = mid_section_defect_service.query_analysis(start_date, end_date, loss_reasons)
# 改為: 呼叫 service 層的管線函數service 內部已使用 LineageEngine + EventFetcher
# response format 完全不變
result = mid_section_defect_service.query_analysis(start_date, end_date, loss_reasons)
return jsonify({"success": True, "data": result})
```
**實際上 `/analysis` handler 不需要改**——`unified-lineage-engine` Phase 1 已將 service 內部改為使用 `LineageEngine`。本變更只需確認 `/analysis` 回傳結果與重構前完全一致golden test 驗證),不需額外的橋接程式碼。
**Golden test 策略**:
- 選取 ≥3 組已知查詢參數(不同日期範圍、不同 loss_reasons 組合)
- 比對重構前後 `/analysis` JSON response 結構和數值
- 允許浮點數 tolerancedefect_rate 等百分比欄位 ±0.01%
### D7: Legacy static JS 清理
**選擇**: 直接刪除 `src/mes_dashboard/static/js/query-tool.js`
**理由**:
- 此檔案 3056L / 126KB是 pre-Vite 時代的靜態 JS
- `query_tool.html` template 使用 `frontend_asset('query-tool.js')` 載入 Vite 建置產物,非此靜態檔案
- Vite config 確認 entry point: `'query-tool': resolve(__dirname, 'src/query-tool/main.js')`
- `frontend_asset()` 解析 Vite manifest不會指向 `static/js/`
- grep 確認無其他引用
**驗證步驟**:
1. `grep -r "static/js/query-tool.js" src/ frontend/ templates/` → 0 結果
2. 確認 `frontend_asset('query-tool.js')` 正確解析到 Vite manifest 中的 hashed filename
3. 確認 `frontend/src/query-tool/main.js` 為 active entryVite config `input` 對應)
### D8: 實作順序
**Phase 1**: 後端 trace_routes.py無前端改動
1. 建立 `trace_routes.py` + 三個 route handler
2.`app.py` 註冊 `trace_bp` Blueprint
3. Profile dispatch functions呼叫既有 service 邏輯)
4. Rate limit + cache 配置
5. 錯誤碼 + HTTP status 對齊 spec
6. API contract 測試request/response schema 驗證)
**Phase 2**: 前端共用元件
1. 建立 `useTraceProgress.js` composable
2. 建立 `TraceProgressBar.vue` 進度指示器
3. 單元測試mock API calls驗證 stage 狀態轉換)
**Phase 3**: mid-section-defect 漸進渲染
1. `App.vue` 查詢流程改為 `useTraceProgress`
2. 加入 skeleton placeholders + fade-in transitions
3.`TraceProgressBar` 取代 loading spinner
4. 驗證 auto-refresh 整合
5. Golden test: `/analysis` 回傳結果不變
**Phase 4**: query-tool on-demand lineage
1. `useQueryToolData.js` 新增 `lineageCache` + `loadLotLineage()`
2. lot 列表加入 lineage 展開 UI
3. 驗證既有 resolve → history → association 流程不受影響
**Phase 5**: Legacy cleanup
1. 刪除 `src/mes_dashboard/static/js/query-tool.js`
2. grep 確認零引用
3. 確認 `frontend_asset()` 解析正常
## Risks / Trade-offs
| Risk | Mitigation |
|------|-----------|
| 分段 API 增加前端複雜度3 次 fetch + 狀態管理) | 封裝在 `useTraceProgress` composable頁面只需 `execute(params)` + watch `stage_results` |
| `/analysis` golden test 因浮點精度失敗 | 允許 defect_rate 等百分比 ±0.01% tolerance整數欄位嚴格比對 |
| mid-section-defect skeleton → chart 渲染閃爍 | 固定高度 placeholder + fade-in 300ms transitionchart container 不允許 height auto |
| `useTraceProgress` abort 與 `useAutoRefresh` 衝突 | auto-refresh 觸發前先呼叫 `trace.abort()`,確保上一輪 fetch 完整中止 |
| query-tool lineage per-lot fetch 對高頻展開造成 DB 壓力 | lineageCache 防止重複 fetch + trace-lineage rate limit (10/60s) 保護 |
| `static/js/query-tool.js` 刪除影響未知路徑 | grep 全域確認 0 引用 + `frontend_asset()` 確認 Vite manifest 解析正確 |
| cache_key 傳遞中斷(前端忘記傳 cache_key | cache_key 為選填,僅用於 logging correlation缺少不影響功能 |
## Open Questions
- `useTraceProgress` 是否需要支援 retry某段失敗後重試該段而非整體重新執行暫不支援——失敗後使用者重新按查詢按鈕即可。
- mid-section-defect 的 aggregation 邏輯KPI、charts、trend 計算)是放在 `/api/trace/events` 的 mid_section_defect profile 分支內,還是由前端從 raw events 自行計算?**決定: 放在後端 `/api/trace/events` 的 aggregation field**——前端不應承擔 defect attribution 計算責任,且計算邏輯已在 service 層成熟。
- `TraceProgressBar.vue` 放在 `shared-composables/` 還是獨立的 `shared-components/` 目錄?暫放 `shared-composables/`(與 composable 搭配使用),若未來 shared 元件增多再考慮拆分。

View File

@@ -0,0 +1,148 @@
## Why
`unified-lineage-engine` 完成後,後端追溯管線從 30-120 秒降至 3-8 秒但對於大範圍查詢日期跨度長、LOT 數量多)仍可能需要 5-15 秒。目前的 UX 模式是「使用者點擊查詢 → 等待黑盒 → 全部結果一次出現」,即使後端已加速,使用者仍感受不到進度,只有一個 loading spinner。
兩個頁面的前端載入模式存在差異:
- **mid-section-defect**: 一次 API call (`/analysis`) 拿全部結果KPI + charts + detail後端做完全部 4 個 stage 才回傳。
- **query-tool**: Vue 3 版本(`frontend/src/query-tool/`已有手動順序resolve → history → association但部分流程仍可改善漸進載入體驗。
需要統一兩個頁面的前端查詢體驗,實現「分段載入 + 進度可見」的 UX 模式,讓使用者看到追溯的漸進結果而非等待黑盒。
**邊界聲明**:本變更負責新增分段 API endpoints`/api/trace/*`)和前端漸進式載入 UX。後端追溯核心邏輯`LineageEngine``EventFetcher`)由前置的 `unified-lineage-engine` 變更提供,本變更僅作為 API routing layer 呼叫這些模組。
## What Changes
### 後端:新增分段 API endpoints
新增 `trace_routes.py` Blueprint`/api/trace/`),將追溯管線的每個 stage 獨立暴露為 endpoint。通過 `profile` 參數區分頁面行為:
**POST `/api/trace/seed-resolve`**
- Request: `{ "profile": "query_tool" | "mid_section_defect", "params": { ... } }`
- `query_tool` params: `{ "resolve_type": "lot_id" | "serial_number" | "work_order", "values": [...] }`
- `mid_section_defect` params: `{ "date_range": [...], "workcenter": "...", ... }` (TMTT detection 參數)
- Response: `{ "stage": "seed-resolve", "seeds": [{ "container_id": "...", "container_name": "...", "lot_id": "..." }], "seed_count": N, "cache_key": "trace:{hash}" }`
- Error: `{ "error": "...", "code": "SEED_RESOLVE_EMPTY" | "SEED_RESOLVE_TIMEOUT" | "INVALID_PROFILE" }`
- Rate limit: `configured_rate_limit(bucket="trace-seed", default_max_attempts=10, default_window_seconds=60)`
- Cache: L2 Redis, key = `trace:seed:{profile}:{params_hash}`, TTL = 300s
**POST `/api/trace/lineage`**
- Request: `{ "profile": "query_tool" | "mid_section_defect", "container_ids": [...], "cache_key": "trace:{hash}" }`
- Response: `{ "stage": "lineage", "ancestors": { "{cid}": ["{ancestor_cid}", ...] }, "merges": { "{cid}": ["{merge_source_cid}", ...] }, "total_nodes": N }`
- Error: `{ "error": "...", "code": "LINEAGE_TIMEOUT" | "LINEAGE_TOO_LARGE" }`
- Rate limit: `configured_rate_limit(bucket="trace-lineage", default_max_attempts=10, default_window_seconds=60)`
- Cache: L2 Redis, key = `trace:lineage:{sorted_cids_hash}`, TTL = 300s
- 冪等性: 相同 `container_ids` 集合(排序後 hash回傳 cache 結果
**POST `/api/trace/events`**
- Request: `{ "profile": "query_tool" | "mid_section_defect", "container_ids": [...], "domains": ["history", "materials", ...], "cache_key": "trace:{hash}" }`
- `mid_section_defect` 額外支援 `"domains": ["upstream_history"]` 和自動串接 aggregation
- Response: `{ "stage": "events", "results": { "{domain}": { "data": [...], "count": N } }, "aggregation": { ... } | null }`
- Error: `{ "error": "...", "code": "EVENTS_TIMEOUT" | "EVENTS_PARTIAL_FAILURE" }`
- `EVENTS_PARTIAL_FAILURE`: 部分 domain 查詢失敗時仍回傳已成功的結果,`failed_domains` 列出失敗項
- Rate limit: `configured_rate_limit(bucket="trace-events", default_max_attempts=15, default_window_seconds=60)`
- Cache: L2 Redis, key = `trace:evt:{profile}:{domains_hash}:{sorted_cids_hash}`, TTL = 300s
**所有 endpoints 共通規則**:
- HTTP status: 200 (success), 400 (invalid params/profile), 429 (rate limited), 504 (stage timeout >10s)
- Rate limit headers: `Retry-After`(對齊 `rate_limit.py` 既有實作,回應 body 含 `retry_after_seconds` 欄位)
- `cache_key` 為選填欄位,前端可傳入前一 stage 回傳的 cache_key 作為追溯鏈標識(用於 logging correlation不影響 cache 命中邏輯
- 每個 stage 獨立可呼叫——前端可按需組合,不要求嚴格順序(但 lineage 需要 seed 結果的 container_idsevents 需要 lineage 結果的 container_ids
### 舊 endpoint 兼容
- `/api/mid-section-defect/analysis` 保留內部改為呼叫分段管線seed-resolve → lineage → events+aggregation後合併結果回傳。行為等價API contract 不變。
- `/api/query-tool/*` 保留不變,前端可視進度逐步遷移到新 API。
### 前端:漸進式載入
- 新增 `frontend/src/shared-composables/useTraceProgress.js` composable封裝
- 三段式 sequential fetchseed → lineage → events
- 每段完成後更新 reactive state`current_stage`, `completed_stages`, `stage_results`
- 錯誤處理: 每段獨立,某段失敗不阻斷已完成的結果顯示
- profile 參數注入
- **mid-section-defect** (`App.vue`): 查詢流程改為分段 fetch + 漸進渲染:
- 查詢後先顯示 seed lots 列表skeleton UI → 填入 seed 結果)
- 血緣樹結構逐步展開
- KPI/圖表以 skeleton placeholders + fade-in 動畫漸進填入,避免 layout shift
- 明細表格仍使用 detail endpoint 分頁
- **query-tool** (`useQueryToolData.js`): lineage tab 改為 on-demand 展開(使用者點擊 lot 後才查血緣),主要強化漸進載入體驗。
- 兩個頁面新增進度指示器元件,顯示目前正在執行的 stageseed → lineage → events → aggregation和已完成的 stage。
### Legacy 檔案處理
- **廢棄**: `src/mes_dashboard/static/js/query-tool.js`3056L, 126KB——這是 pre-Vite 時代的靜態 JS 檔案,目前已無任何 template 載入(`query_tool.html` 使用 `frontend_asset('query-tool.js')` 載入 Vite 建置產物,非此靜態檔案)。此檔案為 dead code可安全刪除。
- **保留**: `frontend/src/query-tool/main.js`3139L——這是 Vue 3 版本的 Vite entry pointVite config 確認為 `'query-tool': resolve(__dirname, 'src/query-tool/main.js')`。此檔案持續維護。
- **保留**: `src/mes_dashboard/templates/query_tool.html`——Jinja2 模板line 1264 `{% set query_tool_js = frontend_asset('query-tool.js') %}` 載入 Vite 建置產物。目前 portal-shell route 已生效(`/portal-shell/query-tool` 走 Vue 3此模板為 non-portal-shell 路由的 fallback暫不刪除。
## Capabilities
### New Capabilities
- `trace-staged-api`: 統一的分段追溯 API 層(`/api/trace/seed-resolve``/api/trace/lineage``/api/trace/events`)。通過 `profile` 參數配置頁面行為。每段獨立可 cacheL2 Redis、可 rate limit`configured_rate_limit()`前端可按需組合。API contract 定義於本提案 What Changes 章節。
- `progressive-trace-ux`: 兩個頁面的漸進式載入 UX。`useTraceProgress` composable 封裝三段式 sequential fetch + reactive state。包含
- 進度指示器元件(顯示 seed → lineage → events → aggregation 各階段狀態)
- mid-section-defect: seed lots 先出 → 血緣結構 → KPI/圖表漸進填入skeleton + fade-in
- query-tool: lineage tab 改為 on-demand 展開(使用者點擊 lot 後才查血緣)
### Modified Capabilities
- `trace-staged-api` 取代 mid-section-defect 現有的單一 `/analysis` endpoint 邏輯(保留舊 endpoint 作為兼容,內部改為呼叫分段管線 + 合併結果,行為等價)。
- query-tool 現有的 `useQueryToolData.js` composable 改為使用分段 API。
## Impact
- **Affected code**:
- 新建: `src/mes_dashboard/routes/trace_routes.py`, `frontend/src/shared-composables/useTraceProgress.js`
- 重構: `frontend/src/mid-section-defect/App.vue`(查詢流程改為分段 fetch
- 重構: `frontend/src/query-tool/composables/useQueryToolData.js`lineage 改分段)
- 修改: `src/mes_dashboard/routes/mid_section_defect_routes.py``/analysis` 內部改用分段管線)
- 刪除: `src/mes_dashboard/static/js/query-tool.js`pre-Vite dead code, 3056L, 126KB, 無任何引用)
- **Runtime/deploy**: 無新依賴。新增 3 個 API endpoints`/api/trace/*`),原有 endpoints 保持兼容。
- **APIs/pages**: 新增 `/api/trace/seed-resolve``/api/trace/lineage``/api/trace/events` 三個 endpointscontract 定義見 What Changes 章節)。原有 `/api/mid-section-defect/analysis``/api/query-tool/*` 保持兼容但 `/analysis` 內部實作改為呼叫分段管線。
- **UX**: 查詢體驗從「黑盒等待」變為「漸進可見」。mid-section-defect 使用者可在血緣解析階段就看到 seed lots 和初步資料。
## Verification
**前端漸進載入驗收**:
| 指標 | 現況 | 目標 | 條件 |
|------|------|------|------|
| mid-section-defect 首次可見內容 (seed lots) | 全部完成後一次顯示30-120s, unified-lineage-engine 後 3-8s | seed stage 完成即顯示≤3s | ≥10 seed lots 查詢 |
| mid-section-defect KPI/chart 完整顯示 | 同上 | lineage + events 完成後顯示≤8s | skeleton → fade-in, 無 layout shift |
| query-tool lineage tab | 一次載入全部 lot 的 lineage | 點擊單一 lot 後載入該 lot lineage≤3s | on-demand, ≥20 lots resolved |
| 進度指示器 | 無loading spinner | 每個 stage 切換時更新進度文字 | seed → lineage → events 三階段可見 |
**API contract 驗收**:
- 每個 `/api/trace/*` endpoint 回傳 JSON 結構符合 What Changes 章節定義的 schema
- 400 (invalid params) / 429 (rate limited) / 504 (timeout) status code 正確回傳
- Rate limit header `Retry-After` 存在(對齊既有 `rate_limit.py` 實作)
- `/api/mid-section-defect/analysis` 兼容性: 回傳結果與重構前完全一致golden test 比對)
**Legacy cleanup 驗收**:
- `src/mes_dashboard/static/js/query-tool.js` 已刪除
- grep 確認無任何程式碼引用 `static/js/query-tool.js`
- `query_tool.html``frontend_asset('query-tool.js')` 仍正常解析到 Vite 建置產物
## Dependencies
- **前置條件**: `unified-lineage-engine` 變更必須先完成。本變更依賴 `LineageEngine``EventFetcher` 作為分段 API 的後端實作。
## Non-Goals
- 不實作 SSE (Server-Sent Events) 或 WebSocket 即時推送——考慮到 gunicorn sync workers 的限制,使用分段 API + 前端 sequential fetch 模式。
- 不改動後端追溯邏輯——分段 API 純粹是將 `LineageEngine` / `EventFetcher` 各 stage 獨立暴露為 HTTP endpoint不改變計算邏輯。
- 不新增任務隊列Celery/RQ——維持同步 request-response 模式,各 stage 控制在 <10s 回應時間內
- 不改動 mid-section-defect defect attribution 演算法
- 不改動 query-tool equipment-period 查詢已有 `read_sql_df_slow` 120s timeout 處理)。
- 不改動 DB schema不建立 materialized view——所有最佳化在應用層完成
## Risks
| 風險 | 緩解 |
|------|------|
| 分段 API 增加前端複雜度多次 fetch + 狀態管理 | 封裝為 `useTraceProgress` composable頁面只需提供 profile + params內部處理 sequential fetch + error + state |
| 前後端分段 contract 不匹配 | API contract 完整定義於本提案 What Changes 章節 request/response schemaerror codescache key 格式CI 契約測試驗證 |
| `/analysis` endpoint 需保持兼容 | 保留舊 endpoint內部改為呼叫分段管線 + 合併結果golden test 比對重構前後輸出一致 |
| 刪除 `static/js/query-tool.js` 影響功能 | 已確認此檔案為 pre-Vite dead code`query_tool.html` 使用 `frontend_asset('query-tool.js')` 載入 Vite 建置產物非此靜態檔案grep 確認無其他引用 |
| mid-section-defect 分段渲染導致 chart 閃爍 | 使用 skeleton placeholders + fade-in 動畫避免 layout shiftchart container 預留固定高度 |
| `cache_key` 被濫用於跨 stage 繞過 rate limit | cache_key 僅用於 logging correlation不影響 cache 命中或 rate limit 邏輯每個 stage 獨立計算 cache key |

View File

@@ -0,0 +1,25 @@
## MODIFIED Requirements
### Requirement: Staged trace API endpoints SHALL apply rate limiting
The `/api/trace/seed-resolve`, `/api/trace/lineage`, and `/api/trace/events` endpoints SHALL apply per-client rate limiting using the existing `configured_rate_limit` mechanism.
#### Scenario: Seed-resolve rate limit exceeded
- **WHEN** a client sends more than 10 requests to `/api/trace/seed-resolve` within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
#### Scenario: Lineage rate limit exceeded
- **WHEN** a client sends more than 10 requests to `/api/trace/lineage` within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
#### Scenario: Events rate limit exceeded
- **WHEN** a client sends more than 15 requests to `/api/trace/events` within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
### Requirement: Mid-section defect analysis endpoint SHALL internally use staged pipeline
The existing `/api/mid-section-defect/analysis` endpoint SHALL internally delegate to the staged trace pipeline while maintaining full backward compatibility.
#### Scenario: Analysis endpoint backward compatibility
- **WHEN** a client calls `GET /api/mid-section-defect/analysis` with existing query parameters
- **THEN** the response JSON structure SHALL be identical to pre-refactoring output
- **THEN** existing rate limiting (6/min analysis, 15/min detail, 3/min export) SHALL remain unchanged
- **THEN** existing distributed lock behavior SHALL remain unchanged

View File

@@ -0,0 +1,64 @@
## ADDED Requirements
### Requirement: useTraceProgress composable SHALL orchestrate staged fetching with reactive state
`useTraceProgress` SHALL provide a shared composable for sequential stage fetching with per-stage reactive state updates.
#### Scenario: Normal three-stage fetch sequence
- **WHEN** `useTraceProgress` is invoked with profile and params
- **THEN** it SHALL execute seed-resolve → lineage → events sequentially
- **THEN** after each stage completes, `current_stage` and `completed_stages` reactive refs SHALL update immediately
- **THEN** `stage_results` SHALL accumulate results from completed stages
#### Scenario: Stage failure does not block completed results
- **WHEN** the lineage stage fails after seed-resolve has completed
- **THEN** seed-resolve results SHALL remain visible and accessible
- **THEN** the error SHALL be captured in stage-specific error state
- **THEN** subsequent stages (events) SHALL NOT execute
### Requirement: mid-section-defect SHALL render progressively as stages complete
The mid-section-defect page SHALL display partial results as each trace stage completes.
#### Scenario: Seed lots visible before lineage completes
- **WHEN** seed-resolve stage completes (≤3s for ≥10 seed lots)
- **THEN** the seed lots list SHALL be rendered immediately
- **THEN** lineage and events sections SHALL show skeleton placeholders
#### Scenario: KPI/charts visible after events complete
- **WHEN** lineage and events stages complete
- **THEN** KPI cards and charts SHALL render with fade-in animation
- **THEN** no layout shift SHALL occur (skeleton placeholders SHALL have matching dimensions)
#### Scenario: Detail table pagination unchanged
- **WHEN** the user requests detail data
- **THEN** the existing detail endpoint with pagination SHALL be used (not the staged API)
### Requirement: query-tool lineage tab SHALL load on-demand
The query-tool lineage tab SHALL load lineage data for individual lots on user interaction, not batch-load all lots.
#### Scenario: User clicks a lot to view lineage
- **WHEN** the user clicks a lot card to expand lineage information
- **THEN** lineage SHALL be fetched via `/api/trace/lineage` for that single lot's container IDs
- **THEN** response time SHALL be ≤3s for the individual lot
#### Scenario: Multiple lots expanded
- **WHEN** the user expands lineage for multiple lots
- **THEN** each lot's lineage SHALL be fetched independently (not batch)
- **THEN** already-fetched lineage data SHALL be preserved (not re-fetched)
### Requirement: Both pages SHALL display a stage progress indicator
Both mid-section-defect and query-tool SHALL display a progress indicator showing the current trace stage.
#### Scenario: Progress indicator during staged fetch
- **WHEN** a trace query is in progress
- **THEN** a progress indicator SHALL display the current stage (seed → lineage → events)
- **THEN** completed stages SHALL be visually distinct from pending/active stages
- **THEN** the indicator SHALL replace the existing single loading spinner
### Requirement: Legacy static query-tool.js SHALL be removed
The pre-Vite static file `src/mes_dashboard/static/js/query-tool.js` (3056L, 126KB) SHALL be deleted as dead code.
#### Scenario: Dead code removal verification
- **WHEN** `static/js/query-tool.js` is deleted
- **THEN** grep for `static/js/query-tool.js` SHALL return zero results across the codebase
- **THEN** `query_tool.html` template SHALL continue to function via `frontend_asset('query-tool.js')` which resolves to the Vite-built bundle
- **THEN** `frontend/src/query-tool/main.js` (Vue 3 Vite entry) SHALL remain unaffected

View File

@@ -0,0 +1,89 @@
## ADDED Requirements
### Requirement: Staged trace API SHALL expose seed-resolve endpoint
`POST /api/trace/seed-resolve` SHALL resolve seed lots based on the provided profile and parameters.
#### Scenario: query_tool profile seed resolve
- **WHEN** request body contains `{ "profile": "query_tool", "params": { "resolve_type": "lot_id", "values": [...] } }`
- **THEN** the endpoint SHALL call existing lot resolve logic and return `{ "stage": "seed-resolve", "seeds": [...], "seed_count": N, "cache_key": "trace:{hash}" }`
- **THEN** each seed object SHALL contain `container_id`, `container_name`, and `lot_id`
#### Scenario: mid_section_defect profile seed resolve
- **WHEN** request body contains `{ "profile": "mid_section_defect", "params": { "date_range": [...], "workcenter": "..." } }`
- **THEN** the endpoint SHALL call TMTT detection logic and return seed lots in the same response format
#### Scenario: Empty seed result
- **WHEN** seed resolution finds no matching lots
- **THEN** the endpoint SHALL return HTTP 200 with `{ "stage": "seed-resolve", "seeds": [], "seed_count": 0, "cache_key": "trace:{hash}" }`
- **THEN** the error code `SEED_RESOLVE_EMPTY` SHALL NOT be used for empty results (reserved for resolution failures)
#### Scenario: Invalid profile
- **WHEN** request body contains an unrecognized `profile` value
- **THEN** the endpoint SHALL return HTTP 400 with `{ "error": "...", "code": "INVALID_PROFILE" }`
### Requirement: Staged trace API SHALL expose lineage endpoint
`POST /api/trace/lineage` SHALL resolve lineage graph for provided container IDs using `LineageEngine`.
#### Scenario: Normal lineage resolution
- **WHEN** request body contains `{ "profile": "query_tool", "container_ids": [...] }`
- **THEN** the endpoint SHALL call `LineageEngine.resolve_full_genealogy()` and return `{ "stage": "lineage", "ancestors": {...}, "merges": {...}, "total_nodes": N }`
#### Scenario: Lineage result caching with idempotency
- **WHEN** two requests with the same `container_ids` set (regardless of order) arrive
- **THEN** the cache key SHALL be computed as `trace:lineage:{sorted_cids_hash}`
- **THEN** the second request SHALL return cached result from L2 Redis (TTL = 300s)
#### Scenario: Lineage timeout
- **WHEN** lineage resolution exceeds 10 seconds
- **THEN** the endpoint SHALL return HTTP 504 with `{ "error": "...", "code": "LINEAGE_TIMEOUT" }`
### Requirement: Staged trace API SHALL expose events endpoint
`POST /api/trace/events` SHALL query events for specified domains using `EventFetcher`.
#### Scenario: Normal events query
- **WHEN** request body contains `{ "profile": "query_tool", "container_ids": [...], "domains": ["history", "materials"] }`
- **THEN** the endpoint SHALL return `{ "stage": "events", "results": { "history": { "data": [...], "count": N }, "materials": { "data": [...], "count": N } }, "aggregation": null }`
#### Scenario: mid_section_defect profile with aggregation
- **WHEN** request body contains `{ "profile": "mid_section_defect", "container_ids": [...], "domains": ["upstream_history"] }`
- **THEN** the endpoint SHALL automatically run aggregation logic after event fetching
- **THEN** the response `aggregation` field SHALL contain the aggregated results (not null)
#### Scenario: Partial domain failure
- **WHEN** one domain query fails while others succeed
- **THEN** the endpoint SHALL return HTTP 200 with `{ "error": "...", "code": "EVENTS_PARTIAL_FAILURE" }`
- **THEN** the response SHALL include successfully fetched domains in `results` and list failed domains in `failed_domains`
### Requirement: All staged trace endpoints SHALL apply rate limiting and caching
Every `/api/trace/*` endpoint SHALL use `configured_rate_limit()` and L2 Redis caching.
#### Scenario: Rate limit exceeded on any trace endpoint
- **WHEN** a client exceeds the configured request budget for a trace endpoint
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
- **THEN** the body SHALL contain `{ "error": "...", "meta": { "retry_after_seconds": N } }`
#### Scenario: Cache hit on trace endpoint
- **WHEN** a request matches a cached result in L2 Redis (TTL = 300s)
- **THEN** the cached result SHALL be returned without executing backend logic
- **THEN** Oracle DB connection pool SHALL NOT be consumed
### Requirement: cache_key parameter SHALL be used for logging correlation only
The optional `cache_key` field in request bodies SHALL be used solely for logging and tracing correlation.
#### Scenario: cache_key provided in request
- **WHEN** a request includes `cache_key` from a previous stage response
- **THEN** the value SHALL be logged for correlation purposes
- **THEN** the value SHALL NOT influence cache lookup or rate limiting logic
#### Scenario: cache_key omitted in request
- **WHEN** a request omits the `cache_key` field
- **THEN** the endpoint SHALL function normally without any degradation
### Requirement: Existing `GET /api/mid-section-defect/analysis` SHALL remain compatible
The existing analysis endpoint (GET method) SHALL internally delegate to the staged pipeline while maintaining identical external behavior.
#### Scenario: Legacy analysis endpoint invocation
- **WHEN** a client calls `GET /api/mid-section-defect/analysis` with existing query parameters
- **THEN** the endpoint SHALL internally execute seed-resolve → lineage → events + aggregation
- **THEN** the response format SHALL be identical to the pre-refactoring output
- **THEN** a golden test SHALL verify output equivalence

View File

@@ -0,0 +1,41 @@
## Phase 1: 後端 trace_routes.py Blueprint
- [x] 1.1 建立 `src/mes_dashboard/routes/trace_routes.py``trace_bp` Blueprint`url_prefix='/api/trace'`
- [x] 1.2 實作 `POST /api/trace/seed-resolve` handlerrequest body 驗證、profile dispatch`_seed_resolve_query_tool` / `_seed_resolve_msd`、response format
- [x] 1.3 實作 `POST /api/trace/lineage` handler呼叫 `LineageEngine.resolve_full_genealogy()`、response format、504 timeout 處理
- [x] 1.4 實作 `POST /api/trace/events` handler呼叫 `EventFetcher.fetch_events()`、mid_section_defect profile 自動 aggregation、`EVENTS_PARTIAL_FAILURE` 處理
- [x] 1.5 為三個 endpoint 加入 `configured_rate_limit()`seed: 10/60s, lineage: 10/60s, events: 15/60s
- [x] 1.6 為三個 endpoint 加入 L2 Redis cacheseed: `trace:seed:{profile}:{params_hash}`, lineage: `trace:lineage:{sorted_cids_hash}`, events: `trace:evt:{profile}:{domains_hash}:{sorted_cids_hash}`TTL=300s
- [x] 1.7 在 `src/mes_dashboard/routes/__init__.py` 匯入並註冊 `trace_bp` Blueprint維持專案統一的 route 註冊入口)
- [x] 1.8 API contract 測試:驗證 200/400/429/504 status code、`Retry-After` header、error code 格式、snake_case field names
## Phase 2: 前端共用元件
- [x] 2.1 建立 `frontend/src/shared-composables/useTraceProgress.js`reactive state`current_stage`, `completed_stages`, `stage_results`, `stage_errors`, `is_running`+ `execute()` / `reset()` / `abort()` methods
- [x] 2.2 實作 sequential fetch 邏輯seed-resolve → lineage → events每段完成後立即更新 reactive state錯誤記錄到 stage_errors 不拋出
- [x] 2.3 建立 `frontend/src/shared-composables/TraceProgressBar.vue`三段式進度指示器props: `current_stage`, `completed_stages`, `stage_errors`),完成=green、進行中=blue pulse、待處理=gray、錯誤=red
## Phase 3: mid-section-defect 漸進渲染
- [x] 3.1 在 `frontend/src/mid-section-defect/App.vue` 中引入 `useTraceProgress({ profile: 'mid_section_defect' })`
- [x] 3.2 改造 `loadAnalysis()` 流程:從 `apiGet('/analysis')` 單次呼叫改為 `trace.execute(params)` 分段 fetch
- [x] 3.3 加入 skeleton placeholdersKpiCards6 cards, min-height 100px、ParetoChart6 charts, min-height 300px、TrendChartmin-height 300px灰色脈動動畫
- [x] 3.4 加入 fade-in transitionstage_results.events 完成後 KPI/charts 以 `opacity 0→1, 300ms ease-in` 填入
- [x] 3.5 用 `TraceProgressBar` 取代 filter bar 下方的 loading spinner
- [x] 3.6 整合 `useAutoRefresh``onRefresh``trace.abort()` + `trace.execute(committedFilters)`
- [x] 3.7 驗證 detail 分頁不受影響(仍使用 `/api/mid-section-defect/analysis/detail` GET endpoint
- [x] 3.8 Golden test`/api/mid-section-defect/analysis` GET endpoint 回傳結果與重構前完全一致(浮點 tolerance ±0.01%
## Phase 4: query-tool on-demand lineage
- [x] 4.1 在 `useQueryToolData.js` 新增 `lineageCache` reactive object + `loadLotLineage(containerId)` 函數
- [x] 4.2 `loadLotLineage` 呼叫 `POST /api/trace/lineage``profile: 'query_tool'`, `container_ids: [containerId]`),結果存入 `lineageCache`
- [x] 4.3 在 lot 列表 UI 新增 lineage 展開按鈕accordion pattern點擊觸發 `loadLotLineage`,已快取的不重新 fetch
- [x] 4.4 `resolveLots()` 時清空 `lineageCache`(新一輪查詢)
- [x] 4.5 驗證既有 resolve → lot-history → lot-associations 流程不受影響
## Phase 5: Legacy cleanup
- [x] 5.1 刪除 `src/mes_dashboard/static/js/query-tool.js`3056L, 126KB pre-Vite dead code
- [x] 5.2 `grep -r "static/js/query-tool.js" src/ frontend/ templates/` 確認 0 結果
- [x] 5.3 確認 `frontend_asset('query-tool.js')` 正確解析到 Vite manifest 中的 hashed filename

View File

@@ -24,3 +24,20 @@ The system SHALL continue to maintain full-table cache behavior for `resource` a
- **WHEN** cache update runs for `resource` or `wip`
- **THEN** the updater MUST retain full-table snapshot semantics and MUST NOT switch these domains to partial-only cache mode
### Requirement: Mid-section defect genealogy SHALL use CONNECT BY instead of Python BFS
The mid-section-defect genealogy resolution SHALL use `LineageEngine.resolve_full_genealogy()` (CONNECT BY NOCYCLE) instead of the existing `_bfs_split_chain()` Python BFS implementation.
#### Scenario: Genealogy cold query performance
- **WHEN** mid-section-defect analysis executes genealogy resolution with cache miss
- **THEN** `LineageEngine.resolve_split_ancestors()` SHALL be called (single CONNECT BY query)
- **THEN** response time SHALL be ≤8s (P95) for ≥50 ancestor nodes
- **THEN** Python BFS `_bfs_split_chain()` SHALL NOT be called
#### Scenario: Genealogy hot query performance
- **WHEN** mid-section-defect analysis executes genealogy resolution with L2 Redis cache hit
- **THEN** response time SHALL be ≤1s (P95)
#### Scenario: Golden test result equivalence
- **WHEN** golden test runs with ≥5 known LOTs
- **THEN** CONNECT BY output (`child_to_parent`, `cid_to_name`) SHALL be identical to BFS output for the same inputs

View File

@@ -0,0 +1,24 @@
# event-fetcher-unified Specification
## Purpose
TBD - created by archiving change unified-lineage-engine. Update Purpose after archive.
## Requirements
### Requirement: EventFetcher SHALL provide unified cached event querying across domains
`EventFetcher` SHALL encapsulate batch event queries with L1/L2 layered cache and rate limit bucket configuration, supporting domains: `history`, `materials`, `rejects`, `holds`, `jobs`, `upstream_history`.
#### Scenario: Cache miss for event domain query
- **WHEN** `EventFetcher` is called for a domain with container IDs and no cache exists
- **THEN** the domain query SHALL execute against Oracle via `read_sql_df()`
- **THEN** the result SHALL be stored in L2 Redis cache with key format `evt:{domain}:{sorted_cids_hash}`
- **THEN** L1 memory cache SHALL also be populated (aligned with `core/cache.py` LayeredCache pattern)
#### Scenario: Cache hit for event domain query
- **WHEN** `EventFetcher` is called for a domain and L2 Redis cache contains a valid entry
- **THEN** the cached result SHALL be returned without executing Oracle query
- **THEN** DB connection pool SHALL NOT be consumed
#### Scenario: Rate limit bucket per domain
- **WHEN** `EventFetcher` is used from a route handler
- **THEN** each domain SHALL have a configurable rate limit bucket aligned with `configured_rate_limit()` pattern
- **THEN** rate limit configuration SHALL be overridable via environment variables

View File

@@ -0,0 +1,61 @@
# lineage-engine-core Specification
## Purpose
TBD - created by archiving change unified-lineage-engine. Update Purpose after archive.
## Requirements
### Requirement: LineageEngine SHALL provide unified split ancestor resolution via CONNECT BY NOCYCLE
`LineageEngine.resolve_split_ancestors()` SHALL accept a list of container IDs and return the complete split ancestry graph using a single Oracle `CONNECT BY NOCYCLE` query on `DW_MES_CONTAINER.SPLITFROMID`.
#### Scenario: Normal split chain resolution
- **WHEN** `resolve_split_ancestors()` is called with a list of container IDs
- **THEN** a single SQL query using `CONNECT BY NOCYCLE` SHALL be executed against `DW_MES_CONTAINER`
- **THEN** the result SHALL include a `child_to_parent` mapping and a `cid_to_name` mapping for all discovered ancestor nodes
- **THEN** the traversal depth SHALL be limited to `LEVEL <= 20` (equivalent to existing BFS `bfs_round > 20` guard)
#### Scenario: Large input batch exceeding Oracle IN clause limit
- **WHEN** the input `container_ids` list exceeds `ORACLE_IN_BATCH_SIZE` (1000)
- **THEN** `QueryBuilder.add_in_condition()` SHALL batch the IDs and combine results
- **THEN** all bind parameters SHALL use `QueryBuilder.params` (no string concatenation)
#### Scenario: Cyclic split references in data
- **WHEN** `DW_MES_CONTAINER.SPLITFROMID` contains cyclic references
- **THEN** `NOCYCLE` SHALL prevent infinite traversal
- **THEN** the query SHALL return all non-cyclic ancestors up to `LEVEL <= 20`
#### Scenario: CONNECT BY performance regression
- **WHEN** Oracle 19c execution plan for `CONNECT BY NOCYCLE` performs worse than expected
- **THEN** the SQL file SHALL contain a commented-out recursive `WITH` (recursive subquery factoring) alternative that can be swapped in without code changes
### Requirement: LineageEngine SHALL provide unified merge source resolution
`LineageEngine.resolve_merge_sources()` SHALL accept a list of container IDs and return merge source mappings from `DW_MES_PJ_COMBINEDASSYLOTS`.
#### Scenario: Merge source lookup
- **WHEN** `resolve_merge_sources()` is called with container IDs
- **THEN** the result SHALL include `{cid: [merge_source_cid, ...]}` for all containers that have merge sources
- **THEN** all queries SHALL use `QueryBuilder` bind params
### Requirement: LineageEngine SHALL provide combined genealogy resolution
`LineageEngine.resolve_full_genealogy()` SHALL combine split ancestors and merge sources into a complete genealogy graph.
#### Scenario: Full genealogy for a set of seed lots
- **WHEN** `resolve_full_genealogy()` is called with seed container IDs
- **THEN** split ancestors SHALL be resolved first via `resolve_split_ancestors()`
- **THEN** merge sources SHALL be resolved for all discovered ancestor nodes
- **THEN** the combined result SHALL be equivalent to the existing `_resolve_full_genealogy()` output in `mid_section_defect_service.py`
### Requirement: LineageEngine functions SHALL be profile-agnostic
All `LineageEngine` public functions SHALL accept `container_ids: List[str]` and return dictionary structures without binding to any specific page logic.
#### Scenario: Reuse from different pages
- **WHEN** a new page (e.g., wip-detail) needs lineage resolution
- **THEN** it SHALL be able to call `LineageEngine` functions directly without modification
- **THEN** no page-specific logic (profile, TMTT detection, etc.) SHALL exist in `LineageEngine`
### Requirement: LineageEngine SQL files SHALL reside in `sql/lineage/` directory
New SQL files SHALL follow the existing `SQLLoader` convention under `src/mes_dashboard/sql/lineage/`.
#### Scenario: SQL file organization
- **WHEN** `LineageEngine` executes queries
- **THEN** `split_ancestors.sql` and `merge_sources.sql` SHALL be loaded via `SQLLoader.load_with_params("lineage/split_ancestors", ...)`
- **THEN** the SQL files SHALL NOT reference `HM_LOTMOVEOUT` (48M row table no longer needed for genealogy)

View File

@@ -17,3 +17,25 @@ Services consuming shared Oracle query fragments SHALL preserve existing selecte
- **WHEN** cache services execute queries via shared fragments
- **THEN** resulting payload structure MUST remain compatible with existing aggregation and API contracts
### Requirement: Lineage SQL fragments SHALL be centralized in `sql/lineage/` directory
Split ancestor and merge source SQL queries SHALL be defined in `sql/lineage/` and shared across services via `SQLLoader`.
#### Scenario: Mid-section-defect lineage query
- **WHEN** `mid_section_defect_service.py` needs split ancestry or merge source data
- **THEN** it SHALL call `LineageEngine` which loads SQL from `sql/lineage/split_ancestors.sql` and `sql/lineage/merge_sources.sql`
- **THEN** it SHALL NOT use `sql/mid_section_defect/split_chain.sql` or `sql/mid_section_defect/genealogy_records.sql`
#### Scenario: Deprecated SQL file handling
- **WHEN** `sql/mid_section_defect/genealogy_records.sql` and `sql/mid_section_defect/split_chain.sql` are deprecated
- **THEN** the files SHALL be marked with a deprecated comment at the top
- **THEN** grep SHALL confirm zero `SQLLoader.load` references to these files
- **THEN** the files SHALL be retained for one version before deletion
### Requirement: All user-input SQL queries SHALL use QueryBuilder bind params
`_build_in_filter()` and `_build_in_clause()` in `query_tool_service.py` SHALL be fully replaced by `QueryBuilder.add_in_condition()`.
#### Scenario: Complete migration to QueryBuilder
- **WHEN** the refactoring is complete
- **THEN** grep for `_build_in_filter` and `_build_in_clause` SHALL return zero results
- **THEN** all queries involving user-supplied values SHALL use `QueryBuilder.params`

View File

@@ -0,0 +1,61 @@
# query-tool-safety-hardening Specification
## Purpose
TBD - created by archiving change unified-lineage-engine. Update Purpose after archive.
## Requirements
### Requirement: query-tool resolve functions SHALL use QueryBuilder bind params for all user input
All `resolve_lots()` family functions (`_resolve_by_lot_id`, `_resolve_by_serial_number`, `_resolve_by_work_order`) SHALL use `QueryBuilder.add_in_condition()` with bind parameters instead of `_build_in_filter()` string concatenation.
#### Scenario: Lot resolve with user-supplied values
- **WHEN** a resolve function receives user-supplied lot IDs, serial numbers, or work order names
- **THEN** the SQL query SHALL use `:p0, :p1, ...` bind parameters via `QueryBuilder`
- **THEN** `read_sql_df()` SHALL receive `builder.params` (never an empty `{}` dict for queries with user input)
- **THEN** `_build_in_filter()` and `_build_in_clause()` SHALL NOT be called
#### Scenario: Pure static SQL without user input
- **WHEN** a query contains no user-supplied values (e.g., static lookups)
- **THEN** empty params `{}` is acceptable
- **THEN** no `_build_in_filter()` SHALL be used
#### Scenario: Zero residual references to deprecated functions
- **WHEN** the refactoring is complete
- **THEN** grep for `_build_in_filter` and `_build_in_clause` SHALL return zero results across the entire codebase
### Requirement: query-tool routes SHALL apply rate limiting
All query-tool API endpoints SHALL apply per-client rate limiting using the existing `configured_rate_limit` mechanism.
#### Scenario: Resolve endpoint rate limit exceeded
- **WHEN** a client sends more than 10 requests to query-tool resolve endpoints within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
- **THEN** the resolve service function SHALL NOT be called
#### Scenario: History endpoint rate limit exceeded
- **WHEN** a client sends more than 20 requests to query-tool history endpoints within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
#### Scenario: Association endpoint rate limit exceeded
- **WHEN** a client sends more than 20 requests to query-tool association endpoints within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
### Requirement: query-tool routes SHALL apply response caching
High-cost query-tool endpoints SHALL cache responses in L2 Redis.
#### Scenario: Resolve result caching
- **WHEN** a resolve request succeeds
- **THEN** the response SHALL be cached in L2 Redis with TTL = 60s
- **THEN** subsequent identical requests within TTL SHALL return cached result without Oracle query
### Requirement: lot_split_merge_history SHALL support fast and full query modes
The `lot_split_merge_history.sql` query SHALL support two modes to balance traceability completeness vs performance.
#### Scenario: Fast mode (default)
- **WHEN** `full_history` query parameter is absent or `false`
- **THEN** the SQL SHALL include `TXNDATE >= ADD_MONTHS(SYSDATE, -6)` time window and `FETCH FIRST 500 ROWS ONLY`
- **THEN** query response time SHALL be ≤5s (P95)
#### Scenario: Full mode
- **WHEN** `full_history=true` query parameter is provided
- **THEN** the SQL SHALL NOT include time window restriction
- **THEN** the query SHALL use `read_sql_df_slow` (120s timeout)
- **THEN** query response time SHALL be ≤60s (P95)