feat(reject-history): finalize sql runtime and archive completed openspec changes

This commit is contained in:
egg
2026-03-04 16:32:00 +08:00
parent 1d2786e7a8
commit 5517f7e85c
36 changed files with 2095 additions and 179 deletions

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-03-04

View File

@@ -0,0 +1,83 @@
## Context
`reject-history` 的 two-phase 主查詢 (`POST /api/reject-history/query`) 目前重用 `reject_history/list.sql`。該 SQL 內含 `COUNT(*) OVER()``OFFSET/FETCH`,原本是為 paginated `/api/reject-history/list` 契約設計。當主查詢走大日期範圍且啟用 batch chunk 時,`reject_dataset_cache` 會透過 offset/limit 迴圈重複執行 list 查詢,導致高成本重算與長尾延遲(已出現 90~150 秒慢查詢)。
同時,`/api/reject-history/list` 仍保留於後端路由與測試中,不能以「直接改 list.sql」方式處理否則會有分頁契約與 legacy 調用相容風險。
## Goals / Non-Goals
**Goals:**
-`POST /api/reject-history/query` 改用 primary 專用 SQL移除對 paginated `list.sql` 的執行耦合。
- 讓 batch chunk 路徑不再以 `offset/limit` 迴圈拉全量,降低 chunk 級重複計算。
- 維持 `/api/reject-history/list` 既有分頁與回應語意不變。
- 維持 `/query``/view``/export-cached` 既有資料語意與 API 欄位契約不變。
**Non-Goals:**
- 不移除 `/api/reject-history/list` 或其 legacy 路由。
- 不調整業務指標定義(`REJECT_TOTAL_QTY``DEFECT_QTY`、policy filter
- 不引入新基礎設施(新 DB、新 cache 類型、新第三方依賴)。
## Decisions
### D1: 新增 primary 專用 SQL 模板,不修改 `list.sql` 契約
- Decision: 在 `src/mes_dashboard/sql/reject_history/` 新增主查詢專用 SQLlot-level、非分頁語意供 dataset cache 主查詢使用。
- Why: `list.sql` 同時服務 `/list``/query` 是目前性能與相容性衝突根因。拆分來源可同時滿足性能與相容。
- Alternatives considered:
- 直接修改 `list.sql`:可改善 `/query` 但高機率破壞 `/list` pagination/total_count 契約。
- 只調整並行度參數:可緩解但無法消除 SQL 本身重複計算成本。
### D2: `reject_dataset_cache` direct/chunk 路徑統一使用 primary SQL
- Decision: `execute_primary_query()` 的 direct 路徑與 batch chunk 路徑都切到 primary SQL。
- Why: 若僅改 direct 路徑,長範圍查詢(實際主要痛點)仍走舊的 chunk + paginated list 模式,收益有限。
- Alternatives considered:
- 只改 direct path大範圍查詢仍受慢查詢影響。
- 只改 chunk path短範圍直查仍殘留 list 耦合與語意不一致。
### D3: chunk 執行改為單次查詢,不再依賴 offset/limit 迴圈
- Decision: 每個 chunk 以一次 primary SQL 查詢取得完整 chunk dataset移除 `offset` 迴圈抓取邏輯。
- Why: 現行做法會在同 chunk 內重跑含排序與 total_count 的查詢多次,放大 DB 成本。
- Alternatives considered:
- 保留迴圈但調大 page size只能減少次數仍有重複計算與語意負擔。
### D4: 將 `/list` 相容性納入硬性回歸防護
- Decision: 保留 `/api/reject-history/list` 路由與 `query_list()` 邏輯不變,並補齊相容性測試。
- Why: 專案仍保留路由、文件與 smoke test 依賴;需要明確防止回歸。
- Alternatives considered:
- 一併移除 `/list`:牽涉範圍擴大,與本次性能修復目標不一致。
## Risks / Trade-offs
- [Risk] primary SQL 與 list SQL 欄位差異造成後續 pandas 衍生失敗
→ Mitigation: 明確定義 primary SQL 欄位最小契約,新增單元測試檢查欄位完整性。
- [Risk] chunk 單次查詢結果量過大導致記憶體壓力
→ Mitigation: 保留既有 batch decomposing、max rows/total rows 與 parquet spill guardrail。
- [Risk] 只優化 `/query` 但未改善其他慢查詢來源
→ Mitigation: 本次聚焦 reject-history primary path其他路徑另案處理。
- [Trade-off] 新增一份 SQL 會提高維護成本
→ Mitigation: 文件化「list for paginated API / primary for dataset cache」分工避免再度耦合。
## Migration Plan
1. 新增 primary 專用 SQL 並接入 SQL loader。
2. 修改 `reject_dataset_cache` direct 與 chunk 路徑,改用 primary SQL。
3. 保持 `reject_history_service.query_list()``list.sql` 不變。
4. 補齊測試:
- `/query` 不再送 `offset/limit` 到 primary SQL 路徑。
- `/list` 回應 pagination 契約不變。
5. 先在 dev 環境比對慢查詢日誌與前端逾時事件,再推進上線。
Rollback strategy:
- 若新 primary SQL 發生欄位或性能異常,回退 `reject_dataset_cache` 到原 `list.sql` 路徑(保留新 SQL 檔但不啟用)。
- `/list` 路徑因未改動,回退風險低。
## Open Questions
- primary SQL 檔名是否採 `primary.sql``dataset_primary.sql`(需與現有命名規則一致)?
- 是否需要在 API `meta` 暴露診斷欄位(例如 `primary_sql_source=dedicated`)供線上追蹤?

View File

@@ -0,0 +1,38 @@
## Why
目前 reject-history 的 `POST /api/reject-history/query` 主查詢路徑重用 `list.sql`(含 `COUNT(*) OVER()``OFFSET/FETCH` 分頁語意),在大日期範圍與 batch chunk 場景下會產生高成本重算,已出現多次 90~150 秒慢查詢與前端逾時。`list.sql` 同時服務 legacy `/api/reject-history/list`,直接改動容易破壞既有分頁契約,因此需要將 primary 查詢來源與 list 查詢解耦。
## What Changes
- 新增 reject-history primary 專用 SQLlot-level、非分頁語意供 dataset cache 主查詢使用。
- 調整 `reject_dataset_cache.execute_primary_query()`direct 與 engine chunk 路徑)改用 primary 專用 SQL不再依賴 `list.sql``offset/limit` 分頁迴圈。
- 保留 `list.sql``GET /api/reject-history/list` 現有行為與回應契約(排序、分頁、`TOTAL_COUNT`)不變。
- 補齊回歸防護:新增/調整測試以驗證 `/list` 契約未變,且 `/query` 查詢來源已切換到 primary 專用 SQL。
- 保持 `/query``/view``/export-cached` 的資料語意與欄位契約不變(非 breaking
## Capabilities
### New Capabilities
- `reject-history-primary-query-source-isolation`: 為 primary query 建立獨立資料來源,避免與 paginated list SQL 耦合,降低大範圍查詢延遲與逾時風險。
### Modified Capabilities
- `reject-history-api`: 調整 primary query 的實作要求,明確規範 `/query``/list` 查詢路徑解耦,且 `/list` 契約必須維持相容。
- `batch-query-resilience`: 調整 reject-history chunk 執行要求,移除以 paginated list SQL 疊代抓全量的依賴,降低 chunk 級重複計算成本。
## Impact
- Affected backend code:
- `src/mes_dashboard/services/reject_dataset_cache.py`
- `src/mes_dashboard/services/reject_history_service.py`(若需擴充 SQL template slot
- `src/mes_dashboard/sql/reject_history/`(新增 primary 專用 SQL
- `src/mes_dashboard/routes/reject_history_routes.py`(僅在需要補充 meta/診斷資訊時)
- Affected tests:
- `tests/test_reject_dataset_cache.py`
- `tests/test_reject_history_service.py`
- `tests/test_reject_history_routes.py`
- API surface:
- 無新增或移除 endpoint
- 無既有參數/回應破壞性變更
- Dependencies/infra:
- 無新增外部依賴
- 可沿用既有 slow-query engine、batch engine、cache/spool 機制

View File

@@ -0,0 +1,25 @@
## ADDED Requirements
### Requirement: reject_dataset_cache batch primary execution SHALL avoid paginated replay loops
Batch chunk execution for reject-history primary query SHALL avoid page-by-page replay against paginated list SQL semantics.
#### Scenario: Chunk execution avoids offset iteration
- **WHEN** batch engine executes a reject-history chunk in `execute_primary_query()`
- **THEN** chunk execution SHALL NOT iterate through `offset` pages to assemble full chunk data
- **THEN** chunk execution SHALL retrieve chunk data via the dedicated primary SQL path
#### Scenario: Chunk bind contract excludes pagination parameters
- **WHEN** chunk query parameters are prepared for batch execution
- **THEN** `offset` and `limit` SHALL NOT be required bind variables for normal chunk retrieval
### Requirement: Partial-failure resilience SHALL remain intact after source decoupling
Decoupling from paginated list SQL SHALL NOT regress partial-failure metadata behavior.
#### Scenario: Failed chunks still produce partial-failure metadata
- **WHEN** one or more reject-history chunks fail during batch execution
- **THEN** response `meta` SHALL still report partial-failure indicators according to existing resilience contract
#### Scenario: Successful chunks still merge and continue
- **WHEN** some chunks succeed and others fail
- **THEN** the system SHALL continue to merge successful chunks and return partial results
- **THEN** progress metadata SHALL remain available for diagnostics

View File

@@ -0,0 +1,26 @@
## ADDED Requirements
### Requirement: Reject History API SHALL preserve paginated list contract after primary-query decoupling
The API SHALL keep `GET /api/reject-history/list` behavior and response schema stable after `/query` switches to a dedicated SQL source.
#### Scenario: List endpoint pagination schema remains stable
- **WHEN** `GET /api/reject-history/list` is called with valid date range and paging params
- **THEN** the response SHALL still include `items` and `pagination` with `page`, `perPage`, `total`, and `totalPages`
- **THEN** the endpoint SHALL continue to support page-bound retrieval semantics
#### Scenario: List endpoint sorting semantics remain stable
- **WHEN** two equivalent list requests are executed before and after the primary-query decoupling change
- **THEN** row ordering semantics SHALL remain consistent with existing list contract
### Requirement: Reject History API primary response contract SHALL remain backward compatible
Switching the primary SQL source SHALL NOT alter `/api/reject-history/query` response fields consumed by the current UI flow.
#### Scenario: Primary query response shape is unchanged
- **WHEN** `POST /api/reject-history/query` succeeds
- **THEN** the response SHALL continue to include `query_id`, `summary`, `trend`, `detail`, `available_filters`, and `meta`
- **THEN** existing `/view` and `/export-cached` workflows SHALL remain compatible with the returned `query_id`
#### Scenario: Cache-hit behavior remains unchanged
- **WHEN** the same primary query is executed again within cache lifetime
- **THEN** cache-hit behavior SHALL remain functionally equivalent to pre-decoupling behavior
- **THEN** response field names and types SHALL remain stable

View File

@@ -0,0 +1,25 @@
## ADDED Requirements
### Requirement: Reject-history primary query SHALL use a dedicated non-paginated SQL source
The system SHALL execute `POST /api/reject-history/query` against a dedicated primary SQL template that is isolated from the paginated list SQL contract.
#### Scenario: Direct primary path uses dedicated SQL
- **WHEN** `execute_primary_query()` runs in direct mode (no batch decomposition)
- **THEN** it SHALL compile SQL from the dedicated primary template
- **THEN** it SHALL NOT require `offset` or `limit` bind parameters for result retrieval
#### Scenario: Batch chunk path uses dedicated SQL
- **WHEN** `execute_primary_query()` runs in batch chunk mode
- **THEN** each chunk query SHALL compile SQL from the same dedicated primary template
- **THEN** chunk queries SHALL apply chunk-specific filters without relying on page-by-page replay semantics
### Requirement: Dedicated primary SQL SHALL exclude pagination-only operators
The dedicated primary SQL template SHALL avoid pagination-only constructs used by `/api/reject-history/list`.
#### Scenario: Primary SQL excludes total-count window computation
- **WHEN** the dedicated primary SQL is loaded for `/query`
- **THEN** it SHALL NOT include `COUNT(*) OVER()` as a required output field
#### Scenario: Primary SQL excludes offset-fetch pagination
- **WHEN** the dedicated primary SQL is loaded for `/query`
- **THEN** it SHALL NOT include `OFFSET ... FETCH NEXT ...` pagination clauses

View File

@@ -0,0 +1,25 @@
## 1. Primary SQL Source Isolation
- [x] 1.1 Add a dedicated reject-history primary SQL file under `src/mes_dashboard/sql/reject_history/` without paginated list operators
- [x] 1.2 Ensure the new SQL template preserves the column contract required by dataset-cache derivation (`summary`/`trend`/`detail`/`pareto`)
- [x] 1.3 Keep `src/mes_dashboard/sql/reject_history/list.sql` unchanged for legacy paginated list use
## 2. Service Path Decoupling
- [x] 2.1 Update `reject_dataset_cache.execute_primary_query()` direct path to compile and execute the dedicated primary SQL template
- [x] 2.2 Update reject-history batch chunk execution path to use the dedicated primary SQL template
- [x] 2.3 Remove reject chunk data assembly logic that depends on `offset/limit` pagination replay
- [x] 2.4 Preserve existing cache/spool write path and response shape (`query_id`, `summary`, `trend`, `detail`, `available_filters`, `meta`)
## 3. Compatibility and Resilience Guards
- [x] 3.1 Verify `query_list()` and `GET /api/reject-history/list` pagination behavior remains unchanged
- [x] 3.2 Verify partial-failure metadata behavior remains unchanged for batch mode (`has_partial_failure`, failed chunks/ranges)
- [x] 3.3 Add defensive logging/diagnostics confirming primary query source path selection for troubleshooting
## 4. Tests and Verification
- [x] 4.1 Add or update unit tests in `tests/test_reject_dataset_cache.py` to assert primary/chunk paths no longer require `offset/limit`
- [x] 4.2 Add or update tests in `tests/test_reject_history_service.py` and `tests/test_reject_history_routes.py` to assert `/list` contract compatibility
- [x] 4.3 Run targeted test suite for reject-history cache/service/routes and batch resilience coverage
- [ ] 4.4 Perform manual validation of large-range reject-history query latency and ensure no frontend timeout regression (requires integration env + Oracle data + frontend flow)

View File

@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-03-04

View File

@@ -0,0 +1,124 @@
## Context
reject-history 目前的 cache 後查詢主要依賴 pandas`apply_view``compute_batch_pareto``export_csv_from_cache`),在大範圍資料(百 MB 級)下會出現高峰值 RSS導致 interactive memory guard 拒絕請求與 worker RSS guard 觸發重啟。現有系統已具備 parquet spool`query_spool_store`),但後續計算仍常回載為 DataFrame 再做全表運算。
本次設計目標是在不改變 API 介面與回應 schema 的前提下,將 cache 後運算遷移到 SQL runtimeDuckDB以降低 Python 記憶體壓力,同時保留既有 guard 作為最後保護。
約束條件:
- 不破壞 `reject-history` 前端既有參數與資料結構。
- 需保留 materialized pareto 的命中路徑與語意。
- 需維持明細/匯出的篩選一致性與資料完整性。
- rollout 必須可開關、可回退。
## Goals / Non-Goals
**Goals:**
- 在 cache/spool 資料上導入 DuckDB SQL 執行路徑,避免 pandas 全表 copy/groupby 成為主路徑。
- 第一階段優先改造 `batch-pareto`,在 materialized miss 時改走 cache-SQL。
- 第二階段改造 `view`,使 summary/trend/detail 分頁以 SQL 聚合與查詢產生。
- 第三階段改造 `export-cached`,改為串流輸出,避免一次性 `to_dict` 全載入。
- 保留並持續觀測現有 memory guard穩定後再調整門檻。
**Non-Goals:**
- 不變更 Oracle primary query 與 chunk engine 的核心策略。
- 不新增或移除 reject-history API endpoint。
- 不變更前端查詢流程、URL 參數格式與欄位命名。
- 不在此變更中重寫其他頁面hold/resource/material-trace的 cache 運算。
## Decisions
### D1. 採用 DuckDB 作為 cache-SQL runtime非 SQLite
- **Decision**: 新增 DuckDB 依賴,作為 parquet/spool 查詢與聚合執行引擎。
- **Rationale**:
- DuckDB 可直接查 parquet支援 predicate pushdown、projection pushdown、aggregation/window符合本次需求。
- SQLite 無原生 parquet 掃描能力,需先灌入資料,反而增加一次記憶體與 I/O 成本。
- 相較 pandasDuckDB 在大資料篩選/聚合路徑更容易控制 worker RSS。
- **Alternatives considered**:
- pandas 優化減欄位、category: 已做但仍有高 RSS 與 guard 誤擋。
- SQLite 臨時表: 需要 ETL 步驟,不能直接利用 parquet spool。
### D2. 建立 reject-history 專用 cache-SQL facade
- **Decision**: 新增 `reject_cache_sql_runtime`(名稱可依實作調整)統一提供:
- 載入來源解析parquet spool 優先,必要時 fallback
- 參數綁定與安全 SQL 片段組裝
- 共用 filter 條件建構policy/supplementary/trend/pareto selections
- **Rationale**:
- 避免 SQL 字串組裝分散在 route/service降低語意漂移。
- 將 parity 規則集中管理,便於與 legacy pandas 對照測試。
- **Alternatives considered**:
- 直接在 `reject_dataset_cache.py` 內內嵌 SQL: 快但可維護性差、測試切面不清。
### D3. batch-pareto 路徑優先改造,保留 materialized-hit
- **Decision**:
- `try_materialized_batch_pareto` 命中時行為不變。
- miss/stale/build-fail 時,先走 cache-SQL 批次計算。
- cache-SQL 不可用時,才回退 legacy DataFrame 計算。
- **Rationale**:
- `batch-pareto` 是高頻且高成本聚合點,改造收益最大。
- 保留既有 materialized 快路,避免重工。
- **Alternatives considered**:
- 直接移除 materialized 層: 風險高,且會放棄既有命中收益。
### D4. view 改為 SQL 聚合 + SQL 分頁
- **Decision**:
- `summary`/`trend` 透過 SQL 聚合計算。
- `detail` 透過 SQL 套用所有篩選後再排序分頁。
- 保持現有輸出結構(`analytics_raw``summary``detail.pagination`)。
- **Rationale**:
- 解決目前「先 guard 後篩選」導致的大量誤拒。
- 減少 pandas 多段中間 DataFrame 生命週期。
### D5. export-cached 改為串流匯出
- **Decision**:
- 使用 generator 逐批讀取並寫出 CSV response。
- 不再先建立完整 rows list / to_dict 再回應。
- **Rationale**:
- 匯出為典型大輸出場景,串流可有效降低峰值 RSS。
- 維持既有篩選條件與欄位契約不變。
### D6. 以 feature flags 漸進 rollout保留雙路 fallback
- **Decision**: 新增 runtime 開關(命名待實作定稿),至少包含:
- 全域開關cache-SQL 啟用/停用)
- endpoint 級開關batch/view/export 分別啟用)
- fallback 開關(允許回退到 legacy pandas
- **Rationale**:
- 便於線上灰度與快速回退。
- 降低一次性替換風險。
## Risks / Trade-offs
- **[DuckDB 依賴與執行環境相容性]** → 在 `requirements`/`environment.yml` 固定可用版本CI 與 VM 啟動腳本納入檢查。
- **[SQL 與 pandas 語意偏差]** → 建立 parity 測試(同 query_id、同 filter對比 summary/trend/detail/pareto 結果)。
- **[spool 缺失時路徑回退造成行為不一致]** → 定義明確來源優先序與 fallback reason telemetry保證可觀測。
- **[查詢計畫在極端條件下退化]** → 保留 guard 與 timeout必要時對 SQL runtime 增加最大掃描/輸出限制。
- **[導入初期同時維護雙路徑成本]** → 分階段啟用,待穩定後再收斂 legacy 路徑。
## Migration Plan
1. **Phase 1batch-pareto**
- 引入 DuckDB runtime 與基本來源解析。
- `batch-pareto` materialized miss 路徑改接 cache-SQL。
- 加入 endpoint 級開關與 fallback telemetry。
2. **Phase 2view SQL 化)**
-`summary/trend/detail` 改為 SQL 路徑。
- 調整 memory guard 觸發位置(先縮小資料再 guard 或改為 SQL 結果預估守門)。
3. **Phase 3export 串流)**
- `export-cached` 改為串流生成 CSV。
- 驗證與明細資料的篩選一致性。
4. **Rollout / Rollback**
- 預設先灰度啟用batch -> view -> export
- 若觀測到錯誤率或結果偏差升高,可關閉對應 endpoint 開關回退 legacy。
## Open Questions
- 是否要求 `view``analytics_raw` 維持完全相同排序(若前端對排序有隱性依賴)?
- 是否在本次就引入「cache-SQL 專屬 memory budget 指標」,或先沿用現有 worker guard telemetry

View File

@@ -0,0 +1,42 @@
## Why
目前 reject-history 在快取後的互動查詢仍以 pandas 在 worker 記憶體中做全表 filter/groupby/copy導致大範圍查詢時 RSS 長時間居高不下,觸發 memory guard、batch-pareto 被拒與 worker restart。既有 parquet spool 已存在,應改為「快取後 SQL 化」以降低峰值記憶體並保留既有 API 契約。
## What Changes
- 新增 reject-history 的 cache-SQL 執行層DuckDB優先對 parquet spool / cache 資料做 SQL 查詢與聚合,避免回載整包 pandas DataFrame 再運算。
- 第一階段:`/api/reject-history/batch-pareto` 先改為 DuckDB 路徑(高收益、低風險),維持既有 cross-filter、top80、top20 與回應 schema。
- 第二階段:`/api/reject-history/view` 改為 SQL 化summary/trend 聚合與明細分頁皆走 SQL減少 in-memory 中間資料。
- 第三階段:`/api/reject-history/export-cached` 改為串流匯出,避免先 `to_dict` 全量載入記憶體。
- 保留現有 worker / interactive memory guard 作為最後保護;待 SQL 化穩定後再依監控數據調整 guard 門檻。
- 補齊可觀測性與回歸測試,確保前端提示、明細資料語意與匯出完整性維持相容(非 breaking
## Capabilities
### New Capabilities
- `reject-history-cache-sql-runtime`: 在 reject-history cache/spool 資料上提供 SQL 執行能力DuckDB與查詢路由將互動查詢從 pandas 全表運算轉為 SQL pushdown / 聚合。
### Modified Capabilities
- `reject-history-api`: 調整 `/batch-pareto``/view``/export-cached` 的後端計算路徑要求,明確規範以 cache-SQL 為主、回應契約保持不變。
- `reject-history-pareto-materialized-aggregate`: 調整 materialized miss/fallback 行為,要求優先落到 cache-SQL 計算路徑,而非 DataFrame 全表 regroup。
- `reject-history-detail-export-parity`: 擴充匯出要求為串流輸出,同時維持與目前篩選條件一致的資料範圍與欄位語意。
## Impact
- Affected backend code:
- `src/mes_dashboard/services/reject_dataset_cache.py`
- `src/mes_dashboard/services/reject_pareto_materialized.py`
- `src/mes_dashboard/core/query_spool_store.py`(讀取介面/metadata 支援 SQL runtime
- `src/mes_dashboard/routes/reject_history_routes.py`
- `src/mes_dashboard/sql/reject_history/`(新增/調整 SQL 片段)
- Affected tests:
- `tests/test_reject_dataset_cache.py`
- `tests/test_reject_history_routes.py`
- `tests/test_reject_pareto_materialized.py`
- 新增 cache-SQL runtime 與串流匯出測試
- API surface:
- 不新增 endpoint
- 不變更既有參數與回應 schema非 breaking
- Dependencies/infra:
- 新增 DuckDB Python 依賴
- 可能新增少量 SQL runtime 相關 env 開關啟用、fallback、併發/記憶體上限)

View File

@@ -0,0 +1,68 @@
## MODIFIED Requirements
### Requirement: Reject History API SHALL provide batch Pareto endpoint with cross-filter
The API SHALL provide a batch Pareto endpoint that returns all 6 dimension Pareto results in a single response, supporting cross-dimension filtering with exclude-self logic, and SHALL prefer materialized Pareto snapshots, then cache-SQL runtime, before considering legacy full-detail regrouping.
#### Scenario: Batch Pareto response structure
- **WHEN** `GET /api/reject-history/batch-pareto` is called with valid `query_id`
- **THEN** response SHALL be `{ success: true, data: { dimensions: { reason: {...}, package: {...}, type: {...}, workflow: {...}, workcenter: {...}, equipment: {...} } } }`
- **THEN** each dimension object SHALL include `items` array with schema (`reason`, `metric_value`, `pct`, `cumPct`, `MOVEIN_QTY`, `REJECT_TOTAL_QTY`, `DEFECT_QTY`, `count`)
#### Scenario: Cross-filter exclude-self logic
- **WHEN** `sel_reason=A&sel_type=X` is provided
- **THEN** reason Pareto SHALL be computed with type=X filter applied (but NOT reason=A filter)
- **THEN** type Pareto SHALL be computed with reason=A filter applied (but NOT type=X filter)
- **THEN** package/workflow/workcenter/equipment Paretos SHALL be computed with both reason=A AND type=X filters applied
#### Scenario: Empty selections return unfiltered Paretos
- **WHEN** batch-pareto is called with no `sel_*` parameters
- **THEN** all 6 dimensions SHALL return their full Pareto distribution (subject to `pareto_scope`)
#### Scenario: Cache-only computation
- **WHEN** `query_id` does not exist in cache
- **THEN** the endpoint SHALL return HTTP 400 with error message indicating cache miss
- **THEN** the endpoint SHALL NOT fall back to Oracle query
#### Scenario: Materialized snapshot preferred
- **WHEN** a valid and fresh materialized Pareto snapshot exists for the request context
- **THEN** the endpoint SHALL return results from that snapshot
- **THEN** the endpoint SHALL avoid full lot-level regrouping for the same request
#### Scenario: Materialized miss fallback behavior
- **WHEN** materialized snapshot is unavailable, stale, or build fails
- **THEN** the endpoint SHALL fall back to cache-SQL computation before legacy DataFrame computation
- **THEN** the response schema and filter semantics SHALL remain unchanged
#### Scenario: SQL fallback unavailable
- **WHEN** cache-SQL runtime is disabled or unavailable under materialized miss
- **THEN** the endpoint SHALL follow configured fallback policy deterministically
- **THEN** the response metadata SHALL expose the fallback reason code
#### Scenario: Supplementary and policy filters apply
- **WHEN** batch-pareto is called with supplementary filters (packages, workcenter_groups, reason) and policy toggles
- **THEN** all 6 dimension Paretos SHALL be computed after applying policy and supplementary filters first (before cross-filter)
#### Scenario: Display scope (TOP20) support
- **WHEN** `pareto_display_scope=top20` is provided
- **THEN** applicable dimensions (type, workflow, equipment) SHALL truncate results to top 20 items after sorting
- **WHEN** `pareto_display_scope` is omitted or `all`
- **THEN** all items SHALL be returned (subject to `pareto_scope` filter)
## ADDED Requirements
### Requirement: Reject History API SHALL provide SQL-first cache view derivation with schema parity
The API SHALL derive cache-backed `view` responses through SQL-first runtime when enabled, while preserving existing response schema and filter behavior.
#### Scenario: View response contract preserved
- **WHEN** `GET /api/reject-history/view` is called with valid `query_id`
- **THEN** response payload SHALL keep existing top-level structure containing `analytics_raw`, `summary`, and `detail`
- **THEN** pagination field names and types SHALL remain compatible with current frontend usage
#### Scenario: View SQL-first with deterministic fallback
- **WHEN** SQL runtime is enabled for `view`
- **THEN** summary/trend/detail derivation SHALL use SQL runtime as primary path
- **THEN** fallback to legacy path SHALL follow configured policy and preserve response schema
#### Scenario: Cache-expired behavior unchanged
- **WHEN** `query_id` cache has expired
- **THEN** endpoint SHALL return the same cache-expired status behavior as current implementation

View File

@@ -0,0 +1,45 @@
## ADDED Requirements
### Requirement: Reject History cache-SQL runtime SHALL execute against cached datasets without full DataFrame materialization
The system SHALL provide a SQL runtime for reject-history cached queries that reads from cache/spool data sources and avoids requiring full pandas DataFrame materialization as the primary execution path.
#### Scenario: Spool-backed execution
- **WHEN** a valid `query_id` has parquet spool metadata available
- **THEN** the runtime SHALL execute SQL directly against the spool dataset
- **THEN** the request SHALL NOT require loading the entire dataset into a pandas DataFrame before filtering and aggregation
#### Scenario: Source resolution fallback
- **WHEN** spool data is unavailable for a valid `query_id`
- **THEN** the runtime SHALL follow a deterministic fallback order configured by system policy
- **THEN** the fallback decision SHALL be observable via telemetry metadata
### Requirement: Reject History cache-SQL runtime SHALL preserve filter semantics across batch/view/export paths
The runtime SHALL apply policy, supplementary, trend-date, and pareto selection filters with the same business semantics used by existing reject-history APIs.
#### Scenario: Batch pareto filter parity
- **WHEN** `batch-pareto` is requested with policy toggles, supplementary filters, trend dates, and `sel_*` selections
- **THEN** SQL runtime output SHALL preserve exclude-self cross-filter semantics for each dimension
- **THEN** `pareto_scope=top80` and `pareto_display_scope=top20` behavior SHALL remain unchanged
#### Scenario: View filter parity
- **WHEN** `view` is requested with `query_id` and active supplementary/interactive filters
- **THEN** `summary`, `trend`, and paginated `detail` SHALL all reflect the same effective filter set
- **THEN** response schema SHALL remain compatible with existing frontend contracts
#### Scenario: Export filter parity
- **WHEN** `export-cached` is requested with the same filters as `view`
- **THEN** exported rows SHALL represent the same filtered data scope as view/detail
- **THEN** column naming and field semantics SHALL remain unchanged
### Requirement: Reject History cache-SQL runtime SHALL support controlled rollout and safe fallback
The system SHALL expose runtime switches to enable or disable SQL execution per endpoint and SHALL support fallback to legacy computation when SQL runtime is unavailable.
#### Scenario: Endpoint-level enablement
- **WHEN** SQL runtime is enabled only for `batch-pareto`
- **THEN** `batch-pareto` SHALL use SQL runtime
- **THEN** `view` and `export-cached` SHALL continue using legacy path until explicitly enabled
#### Scenario: SQL runtime fallback
- **WHEN** SQL runtime encounters an execution failure for a request
- **THEN** the system SHALL apply configured fallback behavior (legacy path or fail-fast)
- **THEN** the response or metadata SHALL include a deterministic fallback reason code for operations troubleshooting

View File

@@ -0,0 +1,28 @@
## MODIFIED Requirements
### Requirement: Cached reject-history export SHALL support Pareto multi-select filter parity
The cached export endpoint SHALL support Pareto multi-select context so that exported rows match the currently drilled-down detail scope, and SHALL stream response output to avoid requiring full in-memory row materialization before sending data.
#### Scenario: Apply selected Pareto dimension values
- **WHEN** export request provides `pareto_dimension` and one or more `pareto_values`
- **THEN** the backend SHALL apply an OR-match filter against the mapped dimension column
- **THEN** only rows matching selected values SHALL be exported
#### Scenario: No Pareto selection keeps existing behavior
- **WHEN** `pareto_values` is absent or empty
- **THEN** export SHALL apply no extra Pareto-selected-item filter
- **THEN** existing supplementary and interactive filters SHALL still apply
#### Scenario: Invalid Pareto dimension is rejected
- **WHEN** `pareto_dimension` is not one of supported dimensions
- **THEN** API SHALL return HTTP 400 with descriptive validation error
#### Scenario: Export response is streamed
- **WHEN** cached export is requested for a large filtered dataset
- **THEN** endpoint SHALL stream CSV rows incrementally to the client
- **THEN** endpoint SHALL NOT require building a full rows list in memory before response begins
#### Scenario: Export scope matches view detail scope
- **WHEN** `view` and `export-cached` are called with the same `query_id` and filter set
- **THEN** exported rows SHALL represent the same filtered data scope as detail results
- **THEN** display-only pareto truncation rules SHALL NOT remove rows from export output

View File

@@ -0,0 +1,19 @@
## ADDED Requirements
### Requirement: Materialized Pareto orchestration SHALL use cache-SQL fallback before legacy DataFrame regrouping
When materialized snapshots are not available, orchestration SHALL prefer cache-SQL runtime to compute batch Pareto results before attempting legacy DataFrame regrouping.
#### Scenario: Materialized miss uses cache-SQL fallback
- **WHEN** snapshot read misses, expires, or build fails for a batch-pareto request
- **THEN** orchestration SHALL invoke cache-SQL batch pareto computation as the first fallback path
- **THEN** returned payload SHALL preserve the same dimensions and item schema contract
#### Scenario: Cache-SQL unavailable fallback policy
- **WHEN** cache-SQL fallback is disabled or unavailable after materialized miss
- **THEN** orchestration SHALL apply configured fallback policy (legacy compute or fail-fast)
- **THEN** fallback reason SHALL be recorded in metadata for diagnostics
#### Scenario: Fallback path preserves cross-filter semantics
- **WHEN** cache-SQL fallback is used with multi-dimension `sel_*` filters
- **THEN** exclude-self cross-filter semantics SHALL remain equivalent to materialized and legacy behavior
- **THEN** `pareto_scope` and `pareto_display_scope` rules SHALL remain unchanged

View File

@@ -0,0 +1,34 @@
## 1. SQL Runtime Foundation
- [x] 1.1 新增 reject-history cache-SQL runtime 模組DuckDB 連線管理、來源解析、參數綁定 helper
- [x] 1.2 新增 parquet spool 優先讀取與來源 fallback 策略(含 deterministic fallback reason
- [x] 1.3 新增 runtime feature flags全域與 endpoint 級開關)與預設值
- [x] 1.4 補齊依賴設定(`requirements.txt` / `pyproject.toml` / `environment.yml`)與啟動相容性檢查
## 2. Batch Pareto SQL-first 路徑
- [x] 2.1 將 `batch-pareto` 在 materialized miss/stale/build-fail 時接入 cache-SQL 計算路徑
- [x] 2.2 保留並驗證 exclude-self cross-filter、`top80``top20` 行為一致
- [x] 2.3 實作 SQL 不可用時的 fallback policylegacy 或 fail-fast
- [x] 2.4 補上 batch-pareto parity 測試SQL vs legacy與 fallback metadata 測試
## 3. View SQL 化
- [x] 3.1 以 SQL 重建 `summary``trend` 聚合計算(保持欄位與精度契約)
- [x] 3.2 以 SQL 實作 detail 查詢、排序與分頁(含 policy/supplementary/trend/pareto selections
- [x] 3.3 將 `/api/reject-history/view` 切到 SQL-first 路徑並保留 schema 相容
- [x] 3.4 補上 view parity 測試與 cache-expired 行為回歸測試
## 4. Export Cached 串流化
- [x] 4.1 將 `export-cached` 改為 generator/streaming CSV 輸出
- [x] 4.2 確保 export 與 detail 使用同一套 filter 組合邏輯,維持 scope parity
- [x] 4.3 移除全量 rows list / `to_dict` 依賴,避免匯出前全載入記憶體
- [x] 4.4 補上大資料匯出測試(串流輸出、欄位契約、篩選一致性)
## 5. Observability, Guard, Rollout
- [x] 5.1 新增 SQL runtime telemetry來源、fallback reason、耗時、列數
- [x] 5.2 保留既有 memory guards調整 guard 觸發點與訊息以符合 SQL-first 流程
- [x] 5.3 制定 rollout 策略batch -> view -> export與對應回退開關
- [x] 5.4 更新操作文件與驗證清單(前端提示、匯出不受顯示限制影響、壓測項目)