feat(mid-section-defect): harden with distributed lock, rate limit, filter separation, abort, SQL classification and tests

Address 6 code review findings (P0-P3): add Redis distributed lock to prevent
duplicate Oracle pipeline on cold cache, apply rate limiting to 3 high-cost
routes, separate UI filter state from committed query state, add AbortController
for request cancellation, push workcenter group classification into Oracle SQL
CASE WHEN, and add 18 route+service tests. Also add workcenter group selection
to job-query equipment selector and rename button to "查詢".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
egg
2026-02-10 09:32:14 +08:00
parent 8b1b8da59b
commit af59031f95
16 changed files with 1461 additions and 601 deletions

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-10

View File

@@ -0,0 +1,61 @@
## Context
`/mid-section-defect` 頁面已上線功能完整但缺乏生產環境保護機制。Code review 揭示 6 個問題,依嚴重度分為 P0-P3。現有基礎建設`try_acquire_lock``configured_rate_limit``createAbortSignal`)均可直接複用,無需新增框架。
**現有架構**
- Backend: `query_analysis()` (line 82-184) 含 5min Redis 快取 → `query_analysis_detail()` (line 187-224) 呼叫前者取快取結果再分頁
- Frontend: `Promise.all([summary, detail])` 平行載入 → `useAutoRefresh` 5min 自動刷新
- 上游歷史: SQL 查全量 → Python `get_workcenter_group()` 逐行分類 + order 4-11 過濾
## Goals / Non-Goals
**Goals:**
- 消除首次查詢的雙倍 Oracle 管線執行P0
- 保護高成本路由免受暴衝流量P1a
- 確保 UI 篩選變更不會汙染進行中的 API 呼叫P1b
- 新查詢自動取消舊的進行中請求P2a
- 善用 Oracle Server 做 workcenter 分類支援全線歷程追蹤P2b
- 基礎測試覆蓋防止回歸P3
**Non-Goals:**
- 不改變 API response 格式
- 不重構 `query_analysis()` 管線邏輯
- 不加入前端 UI 新功能
- 不處理 `export_csv()` 的串流效能(目前可接受)
- 不做 DuckDB 中間層或背景預計算
## Decisions
### D1: 分散式鎖策略 — Redis SET NX 輪詢等待
**選擇**: 使用既有 `try_acquire_lock()` + 輪詢 `cache_get()` 等待模式。
**替代方案**: (A) Pub/Sub 通知 — 複雜度高,需新增 channel 管理;(B) 前端序列化 — 改 `Promise.all` 為先 summary 再 detail但仍有自動刷新與手動查詢並行問題。
**理由**: 鎖機制在 service 層統一保護所有入口包含未來新路由fail-open 設計確保 Redis 故障不阻塞。輪詢 0.5s 間隔在 5-35s 典型管線執行時間下損耗可忽略。
### D2: Rate limit 預設值 — 依路由成本分級
**選擇**: `/analysis` 6/60s、`/detail` 15/60s、`/export` 3/60s。
**理由**: `/analysis` 冷查詢 35s每分鐘最多 6 次已足夠(含自動刷新)。`/detail` 分頁翻頁頻率高但走快取15 次寬裕。`/export` 觸發全量串流3 次防誤操作。`/loss-reasons` 已有 24h 快取,無需限速。
### D3: 篩選分離 — committedFilters ref 快照
**選擇**: 新增 `committedFilters` ref按「查詢」時從 `filters` reactive 快照。所有 API 函式讀 `committedFilters`
**替代方案**: (A) deep watch + debounce — 會在使用者輸入中途觸發查詢;(B) URL params 持久化 — 此頁面不需要書籤分享功能。
**理由**: 最小改動,與 `resource-history/App.vue``buildQueryString()` 模式一致。`filters` reactive 繼續作為 UI 雙向繫結,`committedFilters` 是「上次查詢使用的參數」。
### D4: AbortController — keyed signal 設計
**選擇**: `'msd-analysis'` key 用於查詢summary + detail page 1 共用),`'msd-detail'` key 用於獨立翻頁。
**理由**: 新查詢取消舊查詢的所有請求(含翻頁中的 detail翻頁取消前一次翻頁但不影響進行中的查詢。與 `wip-detail/App.vue` 相同模式。
### D5: 上游歷史 SQL 端分類 — CASE WHEN 全線保留
**選擇**: SQL CTE 內加 `CASE WHEN``WORKCENTERNAME` 分類為 `WORKCENTER_GROUP`12 組 + NULL fallbackPython 端直接讀取分類結果,不過濾任何站點。
**替代方案**: (A) Oracle 自訂函式 — 需 DBA 部署;(B) 維持 Python 端分類但移除過濾 — 仍有 10K+ 行逐行 regex 開銷。
**理由**: CASE WHEN 在 Oracle 查詢引擎內原生執行,無 row-by-row function call 開銷。分類邏輯與 `workcenter_groups.py` 的 patterns 完全對齊,但需注意 CASE 順序exclude-first: `元件切割``切割` 之前)。
## Risks / Trade-offs
- **[P0 鎖等待超時]** 若管線執行 >90s極大日期範圍等待方可能超時後自行查詢 → 緩解API_TIMEOUT 本身 120s鎖 TTL 120s 會自動釋放,最壞情況退化為當前行為(雙查詢)
- **[P2b SQL 分類與 Python 不一致]** 若 `workcenter_groups.py` 新增/修改 pattern 但忘記同步 SQL → 緩解SQL 的 NULL fallback 確保不會遺失行,僅分類名稱可能為 NULL
- **[Rate limit 誤擋]** 高頻翻頁或自動刷新可能觸發限速 → 緩解:`/detail` 15/60s 已足夠正常翻頁(每 4s 一頁),自動刷新 5min 間隔遠低於 `/analysis` 6/60s 門檻

View File

@@ -0,0 +1,31 @@
## Why
Code review 發現中段製程不良追溯分析(`/mid-section-defect`)有 6 個問題:首次查詢觸發雙倍 Oracle 管線P0、高成本路由無節流P1a、篩選與查詢狀態耦合P1b、無請求取消機制P2a、上游歷史 workcenter 分類在 Python 端逐行計算而非善用 DB ServerP2b、零測試覆蓋P3。需在功能穩定後立即修復防止 DB 過載與前端競態問題。
## What Changes
- **P0 分散式鎖**`query_analysis()` 加入 `try_acquire_lock` / `release_lock` 包裹計算區段,第二個平行請求等待快取而非重跑管線
- **P1a 路由限速**`/analysis`6/60s`/analysis/detail`15/60s`/export`3/60s加入 `configured_rate_limit` decorator
- **P1b 篩選分離**:新增 `committedFilters` ref所有 API 呼叫(翻頁/自動刷新/匯出)讀取已提交的篩選快照
- **P2a 請求取消**`loadAnalysis()``loadDetail()` 加入 `createAbortSignal(key)` keyed abort新查詢自動取消舊請求
- **P2b SQL 端分類**:上游歷史 SQL 加入 `CASE WHEN` workcenter group 分類(全線歷程不排除任何站點),移除 Python 端 `get_workcenter_group()` 逐行呼叫與 order 4-11 過濾
- **P3 測試覆蓋**:新增 `test_mid_section_defect_routes.py`9 個測試)和 `test_mid_section_defect_service.py`9 個測試)
## Capabilities
### New Capabilities
(無新增能力,本次為既有功能的強化修復)
### Modified Capabilities
- `api-safety-hygiene`: 新增 mid-section-defect 3 個路由的 rate limit 與分散式鎖機制
- `vue-vite-page-architecture`: mid-section-defect 前端加入 committedFilters 篩選分離與 AbortController 請求取消
## Impact
- **Backend**: `mid_section_defect_service.py`(分散式鎖 + 移除 Python 端 workcenter 過濾)、`mid_section_defect_routes.py`rate limit`upstream_history.sql`CASE WHEN 分類)
- **Frontend**: `mid-section-defect/App.vue`committedFilters + abort signal
- **Tests**: 2 個新測試檔案(`test_mid_section_defect_routes.py``test_mid_section_defect_service.py`
- **API 行為變更**: 超過限速門檻回傳 429上游歷史回傳含 `WORKCENTER_GROUP` 欄位(但 API response 格式不變,分類邏輯內部調整)
- **無破壞性變更**: API response 結構、快取 key、前端元件介面均不變

View File

@@ -0,0 +1,73 @@
## ADDED Requirements
### Requirement: Mid-section defect analysis endpoints SHALL apply distributed lock to prevent duplicate pipeline execution
The `/api/mid-section-defect/analysis` pipeline SHALL use a Redis distributed lock to prevent concurrent identical queries from executing the full Oracle pipeline in parallel.
#### Scenario: Two parallel requests with cold cache
- **WHEN** two requests with identical parameters arrive simultaneously and no cache exists
- **THEN** the first request SHALL acquire the lock and execute the full pipeline
- **THEN** the second request SHALL wait by polling the cache until the first request completes
- **THEN** only ONE full Oracle pipeline execution SHALL occur
#### Scenario: Lock wait timeout
- **WHEN** a waiting request does not see a cache result within 90 seconds
- **THEN** the request SHALL proceed with its own pipeline execution (fail-open)
#### Scenario: Redis unavailable
- **WHEN** Redis is unavailable during lock acquisition
- **THEN** the lock function SHALL return acquired=true (fail-open)
- **THEN** the request SHALL proceed normally without blocking
#### Scenario: Pipeline exception with lock held
- **WHEN** the pipeline throws an exception while the lock is held
- **THEN** the lock SHALL be released in a finally block
- **THEN** subsequent requests SHALL NOT be blocked by a stale lock
### Requirement: Mid-section defect routes SHALL apply rate limiting
The `/analysis`, `/analysis/detail`, and `/export` endpoints SHALL apply per-client rate limiting using the existing `configured_rate_limit` mechanism.
#### Scenario: Analysis endpoint rate limit exceeded
- **WHEN** a client sends more than 6 requests to `/api/mid-section-defect/analysis` within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
- **THEN** the service function SHALL NOT be called
#### Scenario: Detail endpoint rate limit exceeded
- **WHEN** a client sends more than 15 requests to `/api/mid-section-defect/analysis/detail` within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
#### Scenario: Export endpoint rate limit exceeded
- **WHEN** a client sends more than 3 requests to `/api/mid-section-defect/export` within 60 seconds
- **THEN** the endpoint SHALL return HTTP 429 with a `Retry-After` header
#### Scenario: Loss reasons endpoint not rate limited
- **WHEN** a client sends requests to `/api/mid-section-defect/loss-reasons`
- **THEN** no rate limiting SHALL be applied (endpoint is lightweight with 24h cache)
### Requirement: Mid-section defect upstream history SHALL classify workcenters in SQL
The upstream history SQL query SHALL classify `WORKCENTERNAME` into workcenter groups using Oracle `CASE WHEN` expressions, returning the full production line history without excluding any stations.
#### Scenario: Workcenter group classification in SQL
- **WHEN** the upstream history query executes
- **THEN** each row SHALL include a `WORKCENTER_GROUP` column derived from `CASE WHEN` pattern matching
- **THEN** the classification SHALL match the patterns defined in `workcenter_groups.py`
#### Scenario: Unknown workcenter name
- **WHEN** a `WORKCENTERNAME` does not match any known pattern
- **THEN** `WORKCENTER_GROUP` SHALL be NULL
- **THEN** the row SHALL still be included in the result (not filtered out)
#### Scenario: Full production line retention
- **WHEN** the upstream history is fetched for ancestor CIDs
- **THEN** ALL stations SHALL be included (cutting, welding, mid-section, testing)
- **THEN** no order-based filtering SHALL be applied
### Requirement: Mid-section defect routes and service SHALL have test coverage
Route and service test files SHALL exist and cover core behaviors.
#### Scenario: Route tests exist
- **WHEN** pytest discovers tests
- **THEN** `tests/test_mid_section_defect_routes.py` SHALL contain tests for success, parameter validation (400), service failure (500), and rate limiting (429)
#### Scenario: Service tests exist
- **WHEN** pytest discovers tests
- **THEN** `tests/test_mid_section_defect_service.py` SHALL contain tests for date validation, pagination logic, and loss reasons caching

View File

@@ -0,0 +1,37 @@
## ADDED Requirements
### Requirement: Mid-section defect page SHALL separate filter state from query state
The mid-section defect page SHALL maintain separate reactive state for UI input (`filters`) and committed query parameters (`committedFilters`).
#### Scenario: User changes date without clicking query
- **WHEN** user modifies the date range in the filter bar but does not click "查詢"
- **THEN** auto-refresh, pagination, and CSV export SHALL continue using the previously committed filter values
- **THEN** the new date range SHALL NOT affect any API calls until "查詢" is clicked
#### Scenario: User clicks query button
- **WHEN** user clicks "查詢"
- **THEN** the current `filters` state SHALL be snapshotted into `committedFilters`
- **THEN** all subsequent API calls SHALL use the committed values
#### Scenario: CSV export uses committed filters
- **WHEN** user clicks "匯出 CSV" after modifying filters without re-querying
- **THEN** the export SHALL use the committed filter values from the last query
- **THEN** the export SHALL NOT use the current UI filter values
### Requirement: Mid-section defect page SHALL cancel in-flight requests on new query
The mid-section defect page SHALL use `AbortController` to cancel in-flight API requests when a new query is initiated.
#### Scenario: New query cancels previous query
- **WHEN** user clicks "查詢" while a previous query is still in-flight
- **THEN** the previous query's summary and detail requests SHALL be aborted
- **THEN** the AbortError SHALL be handled silently (no error banner shown)
#### Scenario: Page navigation cancels previous detail request
- **WHEN** user clicks next page while a previous page request is still in-flight
- **THEN** the previous page request SHALL be aborted
- **THEN** the new page request SHALL proceed independently
#### Scenario: Query and pagination use independent abort keys
- **WHEN** a query is in-flight and user triggers pagination
- **THEN** the query SHALL NOT be cancelled by the pagination request
- **THEN** the pagination SHALL use a separate abort key from the query

View File

@@ -0,0 +1,35 @@
## 1. P0 — 分散式鎖防止重複管線執行
- [x] 1.1 在 `mid_section_defect_service.py``query_analysis()` cache miss 後加入 `try_acquire_lock` / `release_lock` 包裹計算區段
- [x] 1.2 實作 lock-or-wait 邏輯:未取得鎖時輪詢 `cache_get()` 每 0.5s,最多 90s超時 fail-open
- [x] 1.3 在 `finally` 區塊確保鎖釋放,取得鎖後再做 double-check cache
## 2. P1a — 高成本路由限速
- [x] 2.1 在 `mid_section_defect_routes.py` import `configured_rate_limit` 並建立 3 個限速器analysis 6/60s、detail 15/60s、export 3/60s
- [x] 2.2 將限速 decorator 套用到 `/analysis``/analysis/detail``/export` 三個路由
## 3. P1b + P2a — 前端篩選分離與請求取消
- [x] 3.1 在 `App.vue` 新增 `committedFilters` ref`handleQuery()` 時從 `filters` 快照
- [x] 3.2 修改 `buildFilterParams()``exportCsv()` 讀取 `committedFilters` 而非 `filters`
- [x] 3.3 `initPage()` 設定預設日期後同步快照到 `committedFilters`
- [x] 3.4 從 `useAutoRefresh` 解構 `createAbortSignal`,在 `loadAnalysis()` 加入 `'msd-analysis'` signal
- [x] 3.5 `loadDetail()` 接受外部 signal 參數,獨立翻頁時使用 `'msd-detail'` key
- [x] 3.6 `loadAnalysis()``loadDetail()` catch 區塊靜默處理 `AbortError`
## 4. P2b — 上游歷史 SQL 端分類
- [x] 4.1 修改 `upstream_history.sql` CTE 加入 `CASE WHEN``WORKCENTERNAME` 分類為 `WORKCENTER_GROUP`12 組 + NULL fallback
- [x] 4.2 確保 CASE 順序正確(`元件切割`/`PKG_SAW``切割` 之前)
- [x] 4.3 修改 `_fetch_upstream_history()` 讀取 SQL 回傳的 `WORKCENTER_GROUP` 欄位,移除 `get_workcenter_group()` 逐行呼叫與 order 4-11 過濾
## 5. P3 — 測試覆蓋
- [x] 5.1 建立 `tests/test_mid_section_defect_routes.py`success、400 參數驗證、500 service 失敗、429 rate limit共 9 個測試)
- [x] 5.2 建立 `tests/test_mid_section_defect_service.py`日期驗證、分頁邏輯、loss reasons 快取(共 9 個測試)
## 6. 驗證
- [x] 6.1 `npm run build` 前端建置通過
- [x] 6.2 `pytest tests/test_mid_section_defect_routes.py tests/test_mid_section_defect_service.py -v` 全部通過