feat(reject-history): finalize sql runtime and archive completed openspec changes
This commit is contained in:
@@ -0,0 +1,2 @@
|
||||
schema: spec-driven
|
||||
created: 2026-03-04
|
||||
@@ -0,0 +1,83 @@
|
||||
## Context
|
||||
|
||||
`reject-history` 的 two-phase 主查詢 (`POST /api/reject-history/query`) 目前重用 `reject_history/list.sql`。該 SQL 內含 `COUNT(*) OVER()` 與 `OFFSET/FETCH`,原本是為 paginated `/api/reject-history/list` 契約設計。當主查詢走大日期範圍且啟用 batch chunk 時,`reject_dataset_cache` 會透過 offset/limit 迴圈重複執行 list 查詢,導致高成本重算與長尾延遲(已出現 90~150 秒慢查詢)。
|
||||
|
||||
同時,`/api/reject-history/list` 仍保留於後端路由與測試中,不能以「直接改 list.sql」方式處理,否則會有分頁契約與 legacy 調用相容風險。
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- 讓 `POST /api/reject-history/query` 改用 primary 專用 SQL,移除對 paginated `list.sql` 的執行耦合。
|
||||
- 讓 batch chunk 路徑不再以 `offset/limit` 迴圈拉全量,降低 chunk 級重複計算。
|
||||
- 維持 `/api/reject-history/list` 既有分頁與回應語意不變。
|
||||
- 維持 `/query`、`/view`、`/export-cached` 既有資料語意與 API 欄位契約不變。
|
||||
|
||||
**Non-Goals:**
|
||||
- 不移除 `/api/reject-history/list` 或其 legacy 路由。
|
||||
- 不調整業務指標定義(`REJECT_TOTAL_QTY`、`DEFECT_QTY`、policy filter)。
|
||||
- 不引入新基礎設施(新 DB、新 cache 類型、新第三方依賴)。
|
||||
|
||||
## Decisions
|
||||
|
||||
### D1: 新增 primary 專用 SQL 模板,不修改 `list.sql` 契約
|
||||
|
||||
- Decision: 在 `src/mes_dashboard/sql/reject_history/` 新增主查詢專用 SQL(lot-level、非分頁語意),供 dataset cache 主查詢使用。
|
||||
- Why: `list.sql` 同時服務 `/list` 與 `/query` 是目前性能與相容性衝突根因。拆分來源可同時滿足性能與相容。
|
||||
- Alternatives considered:
|
||||
- 直接修改 `list.sql`:可改善 `/query` 但高機率破壞 `/list` pagination/total_count 契約。
|
||||
- 只調整並行度參數:可緩解但無法消除 SQL 本身重複計算成本。
|
||||
|
||||
### D2: `reject_dataset_cache` direct/chunk 路徑統一使用 primary SQL
|
||||
|
||||
- Decision: `execute_primary_query()` 的 direct 路徑與 batch chunk 路徑都切到 primary SQL。
|
||||
- Why: 若僅改 direct 路徑,長範圍查詢(實際主要痛點)仍走舊的 chunk + paginated list 模式,收益有限。
|
||||
- Alternatives considered:
|
||||
- 只改 direct path:大範圍查詢仍受慢查詢影響。
|
||||
- 只改 chunk path:短範圍直查仍殘留 list 耦合與語意不一致。
|
||||
|
||||
### D3: chunk 執行改為單次查詢,不再依賴 offset/limit 迴圈
|
||||
|
||||
- Decision: 每個 chunk 以一次 primary SQL 查詢取得完整 chunk dataset,移除 `offset` 迴圈抓取邏輯。
|
||||
- Why: 現行做法會在同 chunk 內重跑含排序與 total_count 的查詢多次,放大 DB 成本。
|
||||
- Alternatives considered:
|
||||
- 保留迴圈但調大 page size:只能減少次數,仍有重複計算與語意負擔。
|
||||
|
||||
### D4: 將 `/list` 相容性納入硬性回歸防護
|
||||
|
||||
- Decision: 保留 `/api/reject-history/list` 路由與 `query_list()` 邏輯不變,並補齊相容性測試。
|
||||
- Why: 專案仍保留路由、文件與 smoke test 依賴;需要明確防止回歸。
|
||||
- Alternatives considered:
|
||||
- 一併移除 `/list`:牽涉範圍擴大,與本次性能修復目標不一致。
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
- [Risk] primary SQL 與 list SQL 欄位差異造成後續 pandas 衍生失敗
|
||||
→ Mitigation: 明確定義 primary SQL 欄位最小契約,新增單元測試檢查欄位完整性。
|
||||
|
||||
- [Risk] chunk 單次查詢結果量過大導致記憶體壓力
|
||||
→ Mitigation: 保留既有 batch decomposing、max rows/total rows 與 parquet spill guardrail。
|
||||
|
||||
- [Risk] 只優化 `/query` 但未改善其他慢查詢來源
|
||||
→ Mitigation: 本次聚焦 reject-history primary path,其他路徑另案處理。
|
||||
|
||||
- [Trade-off] 新增一份 SQL 會提高維護成本
|
||||
→ Mitigation: 文件化「list for paginated API / primary for dataset cache」分工,避免再度耦合。
|
||||
|
||||
## Migration Plan
|
||||
|
||||
1. 新增 primary 專用 SQL 並接入 SQL loader。
|
||||
2. 修改 `reject_dataset_cache` direct 與 chunk 路徑,改用 primary SQL。
|
||||
3. 保持 `reject_history_service.query_list()` 與 `list.sql` 不變。
|
||||
4. 補齊測試:
|
||||
- `/query` 不再送 `offset/limit` 到 primary SQL 路徑。
|
||||
- `/list` 回應 pagination 契約不變。
|
||||
5. 先在 dev 環境比對慢查詢日誌與前端逾時事件,再推進上線。
|
||||
|
||||
Rollback strategy:
|
||||
- 若新 primary SQL 發生欄位或性能異常,回退 `reject_dataset_cache` 到原 `list.sql` 路徑(保留新 SQL 檔但不啟用)。
|
||||
- `/list` 路徑因未改動,回退風險低。
|
||||
|
||||
## Open Questions
|
||||
|
||||
- primary SQL 檔名是否採 `primary.sql` 或 `dataset_primary.sql`(需與現有命名規則一致)?
|
||||
- 是否需要在 API `meta` 暴露診斷欄位(例如 `primary_sql_source=dedicated`)供線上追蹤?
|
||||
@@ -0,0 +1,38 @@
|
||||
## Why
|
||||
|
||||
目前 reject-history 的 `POST /api/reject-history/query` 主查詢路徑重用 `list.sql`(含 `COUNT(*) OVER()` 與 `OFFSET/FETCH` 分頁語意),在大日期範圍與 batch chunk 場景下會產生高成本重算,已出現多次 90~150 秒慢查詢與前端逾時。`list.sql` 同時服務 legacy `/api/reject-history/list`,直接改動容易破壞既有分頁契約,因此需要將 primary 查詢來源與 list 查詢解耦。
|
||||
|
||||
## What Changes
|
||||
|
||||
- 新增 reject-history primary 專用 SQL(lot-level、非分頁語意),供 dataset cache 主查詢使用。
|
||||
- 調整 `reject_dataset_cache.execute_primary_query()`(direct 與 engine chunk 路徑)改用 primary 專用 SQL,不再依賴 `list.sql` 的 `offset/limit` 分頁迴圈。
|
||||
- 保留 `list.sql` 與 `GET /api/reject-history/list` 現有行為與回應契約(排序、分頁、`TOTAL_COUNT`)不變。
|
||||
- 補齊回歸防護:新增/調整測試以驗證 `/list` 契約未變,且 `/query` 查詢來源已切換到 primary 專用 SQL。
|
||||
- 保持 `/query`、`/view`、`/export-cached` 的資料語意與欄位契約不變(非 breaking)。
|
||||
|
||||
## Capabilities
|
||||
|
||||
### New Capabilities
|
||||
- `reject-history-primary-query-source-isolation`: 為 primary query 建立獨立資料來源,避免與 paginated list SQL 耦合,降低大範圍查詢延遲與逾時風險。
|
||||
|
||||
### Modified Capabilities
|
||||
- `reject-history-api`: 調整 primary query 的實作要求,明確規範 `/query` 與 `/list` 查詢路徑解耦,且 `/list` 契約必須維持相容。
|
||||
- `batch-query-resilience`: 調整 reject-history chunk 執行要求,移除以 paginated list SQL 疊代抓全量的依賴,降低 chunk 級重複計算成本。
|
||||
|
||||
## Impact
|
||||
|
||||
- Affected backend code:
|
||||
- `src/mes_dashboard/services/reject_dataset_cache.py`
|
||||
- `src/mes_dashboard/services/reject_history_service.py`(若需擴充 SQL template slot)
|
||||
- `src/mes_dashboard/sql/reject_history/`(新增 primary 專用 SQL)
|
||||
- `src/mes_dashboard/routes/reject_history_routes.py`(僅在需要補充 meta/診斷資訊時)
|
||||
- Affected tests:
|
||||
- `tests/test_reject_dataset_cache.py`
|
||||
- `tests/test_reject_history_service.py`
|
||||
- `tests/test_reject_history_routes.py`
|
||||
- API surface:
|
||||
- 無新增或移除 endpoint
|
||||
- 無既有參數/回應破壞性變更
|
||||
- Dependencies/infra:
|
||||
- 無新增外部依賴
|
||||
- 可沿用既有 slow-query engine、batch engine、cache/spool 機制
|
||||
@@ -0,0 +1,25 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: reject_dataset_cache batch primary execution SHALL avoid paginated replay loops
|
||||
Batch chunk execution for reject-history primary query SHALL avoid page-by-page replay against paginated list SQL semantics.
|
||||
|
||||
#### Scenario: Chunk execution avoids offset iteration
|
||||
- **WHEN** batch engine executes a reject-history chunk in `execute_primary_query()`
|
||||
- **THEN** chunk execution SHALL NOT iterate through `offset` pages to assemble full chunk data
|
||||
- **THEN** chunk execution SHALL retrieve chunk data via the dedicated primary SQL path
|
||||
|
||||
#### Scenario: Chunk bind contract excludes pagination parameters
|
||||
- **WHEN** chunk query parameters are prepared for batch execution
|
||||
- **THEN** `offset` and `limit` SHALL NOT be required bind variables for normal chunk retrieval
|
||||
|
||||
### Requirement: Partial-failure resilience SHALL remain intact after source decoupling
|
||||
Decoupling from paginated list SQL SHALL NOT regress partial-failure metadata behavior.
|
||||
|
||||
#### Scenario: Failed chunks still produce partial-failure metadata
|
||||
- **WHEN** one or more reject-history chunks fail during batch execution
|
||||
- **THEN** response `meta` SHALL still report partial-failure indicators according to existing resilience contract
|
||||
|
||||
#### Scenario: Successful chunks still merge and continue
|
||||
- **WHEN** some chunks succeed and others fail
|
||||
- **THEN** the system SHALL continue to merge successful chunks and return partial results
|
||||
- **THEN** progress metadata SHALL remain available for diagnostics
|
||||
@@ -0,0 +1,26 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Reject History API SHALL preserve paginated list contract after primary-query decoupling
|
||||
The API SHALL keep `GET /api/reject-history/list` behavior and response schema stable after `/query` switches to a dedicated SQL source.
|
||||
|
||||
#### Scenario: List endpoint pagination schema remains stable
|
||||
- **WHEN** `GET /api/reject-history/list` is called with valid date range and paging params
|
||||
- **THEN** the response SHALL still include `items` and `pagination` with `page`, `perPage`, `total`, and `totalPages`
|
||||
- **THEN** the endpoint SHALL continue to support page-bound retrieval semantics
|
||||
|
||||
#### Scenario: List endpoint sorting semantics remain stable
|
||||
- **WHEN** two equivalent list requests are executed before and after the primary-query decoupling change
|
||||
- **THEN** row ordering semantics SHALL remain consistent with existing list contract
|
||||
|
||||
### Requirement: Reject History API primary response contract SHALL remain backward compatible
|
||||
Switching the primary SQL source SHALL NOT alter `/api/reject-history/query` response fields consumed by the current UI flow.
|
||||
|
||||
#### Scenario: Primary query response shape is unchanged
|
||||
- **WHEN** `POST /api/reject-history/query` succeeds
|
||||
- **THEN** the response SHALL continue to include `query_id`, `summary`, `trend`, `detail`, `available_filters`, and `meta`
|
||||
- **THEN** existing `/view` and `/export-cached` workflows SHALL remain compatible with the returned `query_id`
|
||||
|
||||
#### Scenario: Cache-hit behavior remains unchanged
|
||||
- **WHEN** the same primary query is executed again within cache lifetime
|
||||
- **THEN** cache-hit behavior SHALL remain functionally equivalent to pre-decoupling behavior
|
||||
- **THEN** response field names and types SHALL remain stable
|
||||
@@ -0,0 +1,25 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Reject-history primary query SHALL use a dedicated non-paginated SQL source
|
||||
The system SHALL execute `POST /api/reject-history/query` against a dedicated primary SQL template that is isolated from the paginated list SQL contract.
|
||||
|
||||
#### Scenario: Direct primary path uses dedicated SQL
|
||||
- **WHEN** `execute_primary_query()` runs in direct mode (no batch decomposition)
|
||||
- **THEN** it SHALL compile SQL from the dedicated primary template
|
||||
- **THEN** it SHALL NOT require `offset` or `limit` bind parameters for result retrieval
|
||||
|
||||
#### Scenario: Batch chunk path uses dedicated SQL
|
||||
- **WHEN** `execute_primary_query()` runs in batch chunk mode
|
||||
- **THEN** each chunk query SHALL compile SQL from the same dedicated primary template
|
||||
- **THEN** chunk queries SHALL apply chunk-specific filters without relying on page-by-page replay semantics
|
||||
|
||||
### Requirement: Dedicated primary SQL SHALL exclude pagination-only operators
|
||||
The dedicated primary SQL template SHALL avoid pagination-only constructs used by `/api/reject-history/list`.
|
||||
|
||||
#### Scenario: Primary SQL excludes total-count window computation
|
||||
- **WHEN** the dedicated primary SQL is loaded for `/query`
|
||||
- **THEN** it SHALL NOT include `COUNT(*) OVER()` as a required output field
|
||||
|
||||
#### Scenario: Primary SQL excludes offset-fetch pagination
|
||||
- **WHEN** the dedicated primary SQL is loaded for `/query`
|
||||
- **THEN** it SHALL NOT include `OFFSET ... FETCH NEXT ...` pagination clauses
|
||||
@@ -0,0 +1,25 @@
|
||||
## 1. Primary SQL Source Isolation
|
||||
|
||||
- [x] 1.1 Add a dedicated reject-history primary SQL file under `src/mes_dashboard/sql/reject_history/` without paginated list operators
|
||||
- [x] 1.2 Ensure the new SQL template preserves the column contract required by dataset-cache derivation (`summary`/`trend`/`detail`/`pareto`)
|
||||
- [x] 1.3 Keep `src/mes_dashboard/sql/reject_history/list.sql` unchanged for legacy paginated list use
|
||||
|
||||
## 2. Service Path Decoupling
|
||||
|
||||
- [x] 2.1 Update `reject_dataset_cache.execute_primary_query()` direct path to compile and execute the dedicated primary SQL template
|
||||
- [x] 2.2 Update reject-history batch chunk execution path to use the dedicated primary SQL template
|
||||
- [x] 2.3 Remove reject chunk data assembly logic that depends on `offset/limit` pagination replay
|
||||
- [x] 2.4 Preserve existing cache/spool write path and response shape (`query_id`, `summary`, `trend`, `detail`, `available_filters`, `meta`)
|
||||
|
||||
## 3. Compatibility and Resilience Guards
|
||||
|
||||
- [x] 3.1 Verify `query_list()` and `GET /api/reject-history/list` pagination behavior remains unchanged
|
||||
- [x] 3.2 Verify partial-failure metadata behavior remains unchanged for batch mode (`has_partial_failure`, failed chunks/ranges)
|
||||
- [x] 3.3 Add defensive logging/diagnostics confirming primary query source path selection for troubleshooting
|
||||
|
||||
## 4. Tests and Verification
|
||||
|
||||
- [x] 4.1 Add or update unit tests in `tests/test_reject_dataset_cache.py` to assert primary/chunk paths no longer require `offset/limit`
|
||||
- [x] 4.2 Add or update tests in `tests/test_reject_history_service.py` and `tests/test_reject_history_routes.py` to assert `/list` contract compatibility
|
||||
- [x] 4.3 Run targeted test suite for reject-history cache/service/routes and batch resilience coverage
|
||||
- [ ] 4.4 Perform manual validation of large-range reject-history query latency and ensure no frontend timeout regression (requires integration env + Oracle data + frontend flow)
|
||||
@@ -0,0 +1,2 @@
|
||||
schema: spec-driven
|
||||
created: 2026-03-04
|
||||
@@ -0,0 +1,124 @@
|
||||
## Context
|
||||
|
||||
reject-history 目前的 cache 後查詢主要依賴 pandas(`apply_view`、`compute_batch_pareto`、`export_csv_from_cache`),在大範圍資料(百 MB 級)下會出現高峰值 RSS,導致 interactive memory guard 拒絕請求與 worker RSS guard 觸發重啟。現有系統已具備 parquet spool(`query_spool_store`),但後續計算仍常回載為 DataFrame 再做全表運算。
|
||||
|
||||
本次設計目標是在不改變 API 介面與回應 schema 的前提下,將 cache 後運算遷移到 SQL runtime(DuckDB)以降低 Python 記憶體壓力,同時保留既有 guard 作為最後保護。
|
||||
|
||||
約束條件:
|
||||
- 不破壞 `reject-history` 前端既有參數與資料結構。
|
||||
- 需保留 materialized pareto 的命中路徑與語意。
|
||||
- 需維持明細/匯出的篩選一致性與資料完整性。
|
||||
- rollout 必須可開關、可回退。
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- 在 cache/spool 資料上導入 DuckDB SQL 執行路徑,避免 pandas 全表 copy/groupby 成為主路徑。
|
||||
- 第一階段優先改造 `batch-pareto`,在 materialized miss 時改走 cache-SQL。
|
||||
- 第二階段改造 `view`,使 summary/trend/detail 分頁以 SQL 聚合與查詢產生。
|
||||
- 第三階段改造 `export-cached`,改為串流輸出,避免一次性 `to_dict` 全載入。
|
||||
- 保留並持續觀測現有 memory guard,穩定後再調整門檻。
|
||||
|
||||
**Non-Goals:**
|
||||
- 不變更 Oracle primary query 與 chunk engine 的核心策略。
|
||||
- 不新增或移除 reject-history API endpoint。
|
||||
- 不變更前端查詢流程、URL 參數格式與欄位命名。
|
||||
- 不在此變更中重寫其他頁面(hold/resource/material-trace)的 cache 運算。
|
||||
|
||||
## Decisions
|
||||
|
||||
### D1. 採用 DuckDB 作為 cache-SQL runtime(非 SQLite)
|
||||
|
||||
- **Decision**: 新增 DuckDB 依賴,作為 parquet/spool 查詢與聚合執行引擎。
|
||||
- **Rationale**:
|
||||
- DuckDB 可直接查 parquet,支援 predicate pushdown、projection pushdown、aggregation/window,符合本次需求。
|
||||
- SQLite 無原生 parquet 掃描能力,需先灌入資料,反而增加一次記憶體與 I/O 成本。
|
||||
- 相較 pandas,DuckDB 在大資料篩選/聚合路徑更容易控制 worker RSS。
|
||||
- **Alternatives considered**:
|
||||
- pandas 優化(減欄位、category): 已做但仍有高 RSS 與 guard 誤擋。
|
||||
- SQLite 臨時表: 需要 ETL 步驟,不能直接利用 parquet spool。
|
||||
|
||||
### D2. 建立 reject-history 專用 cache-SQL facade
|
||||
|
||||
- **Decision**: 新增 `reject_cache_sql_runtime`(名稱可依實作調整)統一提供:
|
||||
- 載入來源解析(parquet spool 優先,必要時 fallback)
|
||||
- 參數綁定與安全 SQL 片段組裝
|
||||
- 共用 filter 條件建構(policy/supplementary/trend/pareto selections)
|
||||
- **Rationale**:
|
||||
- 避免 SQL 字串組裝分散在 route/service,降低語意漂移。
|
||||
- 將 parity 規則集中管理,便於與 legacy pandas 對照測試。
|
||||
- **Alternatives considered**:
|
||||
- 直接在 `reject_dataset_cache.py` 內內嵌 SQL: 快但可維護性差、測試切面不清。
|
||||
|
||||
### D3. batch-pareto 路徑優先改造,保留 materialized-hit
|
||||
|
||||
- **Decision**:
|
||||
- `try_materialized_batch_pareto` 命中時行為不變。
|
||||
- miss/stale/build-fail 時,先走 cache-SQL 批次計算。
|
||||
- cache-SQL 不可用時,才回退 legacy DataFrame 計算。
|
||||
- **Rationale**:
|
||||
- `batch-pareto` 是高頻且高成本聚合點,改造收益最大。
|
||||
- 保留既有 materialized 快路,避免重工。
|
||||
- **Alternatives considered**:
|
||||
- 直接移除 materialized 層: 風險高,且會放棄既有命中收益。
|
||||
|
||||
### D4. view 改為 SQL 聚合 + SQL 分頁
|
||||
|
||||
- **Decision**:
|
||||
- `summary`/`trend` 透過 SQL 聚合計算。
|
||||
- `detail` 透過 SQL 套用所有篩選後再排序分頁。
|
||||
- 保持現有輸出結構(`analytics_raw`、`summary`、`detail.pagination`)。
|
||||
- **Rationale**:
|
||||
- 解決目前「先 guard 後篩選」導致的大量誤拒。
|
||||
- 減少 pandas 多段中間 DataFrame 生命週期。
|
||||
|
||||
### D5. export-cached 改為串流匯出
|
||||
|
||||
- **Decision**:
|
||||
- 使用 generator 逐批讀取並寫出 CSV response。
|
||||
- 不再先建立完整 rows list / to_dict 再回應。
|
||||
- **Rationale**:
|
||||
- 匯出為典型大輸出場景,串流可有效降低峰值 RSS。
|
||||
- 維持既有篩選條件與欄位契約不變。
|
||||
|
||||
### D6. 以 feature flags 漸進 rollout,保留雙路 fallback
|
||||
|
||||
- **Decision**: 新增 runtime 開關(命名待實作定稿),至少包含:
|
||||
- 全域開關(cache-SQL 啟用/停用)
|
||||
- endpoint 級開關(batch/view/export 分別啟用)
|
||||
- fallback 開關(允許回退到 legacy pandas)
|
||||
- **Rationale**:
|
||||
- 便於線上灰度與快速回退。
|
||||
- 降低一次性替換風險。
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
- **[DuckDB 依賴與執行環境相容性]** → 在 `requirements`/`environment.yml` 固定可用版本,CI 與 VM 啟動腳本納入檢查。
|
||||
- **[SQL 與 pandas 語意偏差]** → 建立 parity 測試(同 query_id、同 filter,對比 summary/trend/detail/pareto 結果)。
|
||||
- **[spool 缺失時路徑回退造成行為不一致]** → 定義明確來源優先序與 fallback reason telemetry,保證可觀測。
|
||||
- **[查詢計畫在極端條件下退化]** → 保留 guard 與 timeout,必要時對 SQL runtime 增加最大掃描/輸出限制。
|
||||
- **[導入初期同時維護雙路徑成本]** → 分階段啟用,待穩定後再收斂 legacy 路徑。
|
||||
|
||||
## Migration Plan
|
||||
|
||||
1. **Phase 1(batch-pareto)**
|
||||
- 引入 DuckDB runtime 與基本來源解析。
|
||||
- `batch-pareto` materialized miss 路徑改接 cache-SQL。
|
||||
- 加入 endpoint 級開關與 fallback telemetry。
|
||||
|
||||
2. **Phase 2(view SQL 化)**
|
||||
- 將 `summary/trend/detail` 改為 SQL 路徑。
|
||||
- 調整 memory guard 觸發位置(先縮小資料再 guard 或改為 SQL 結果預估守門)。
|
||||
|
||||
3. **Phase 3(export 串流)**
|
||||
- `export-cached` 改為串流生成 CSV。
|
||||
- 驗證與明細資料的篩選一致性。
|
||||
|
||||
4. **Rollout / Rollback**
|
||||
- 預設先灰度啟用(batch -> view -> export)。
|
||||
- 若觀測到錯誤率或結果偏差升高,可關閉對應 endpoint 開關回退 legacy。
|
||||
|
||||
## Open Questions
|
||||
|
||||
- 是否要求 `view` 的 `analytics_raw` 維持完全相同排序(若前端對排序有隱性依賴)?
|
||||
- 是否在本次就引入「cache-SQL 專屬 memory budget 指標」,或先沿用現有 worker guard telemetry?
|
||||
@@ -0,0 +1,42 @@
|
||||
## Why
|
||||
|
||||
目前 reject-history 在快取後的互動查詢仍以 pandas 在 worker 記憶體中做全表 filter/groupby/copy,導致大範圍查詢時 RSS 長時間居高不下,觸發 memory guard、batch-pareto 被拒與 worker restart。既有 parquet spool 已存在,應改為「快取後 SQL 化」以降低峰值記憶體並保留既有 API 契約。
|
||||
|
||||
## What Changes
|
||||
|
||||
- 新增 reject-history 的 cache-SQL 執行層(DuckDB),優先對 parquet spool / cache 資料做 SQL 查詢與聚合,避免回載整包 pandas DataFrame 再運算。
|
||||
- 第一階段:`/api/reject-history/batch-pareto` 先改為 DuckDB 路徑(高收益、低風險),維持既有 cross-filter、top80、top20 與回應 schema。
|
||||
- 第二階段:`/api/reject-history/view` 改為 SQL 化(summary/trend 聚合與明細分頁皆走 SQL),減少 in-memory 中間資料。
|
||||
- 第三階段:`/api/reject-history/export-cached` 改為串流匯出,避免先 `to_dict` 全量載入記憶體。
|
||||
- 保留現有 worker / interactive memory guard 作為最後保護;待 SQL 化穩定後再依監控數據調整 guard 門檻。
|
||||
- 補齊可觀測性與回歸測試,確保前端提示、明細資料語意與匯出完整性維持相容(非 breaking)。
|
||||
|
||||
## Capabilities
|
||||
|
||||
### New Capabilities
|
||||
- `reject-history-cache-sql-runtime`: 在 reject-history cache/spool 資料上提供 SQL 執行能力(DuckDB)與查詢路由,將互動查詢從 pandas 全表運算轉為 SQL pushdown / 聚合。
|
||||
|
||||
### Modified Capabilities
|
||||
- `reject-history-api`: 調整 `/batch-pareto`、`/view`、`/export-cached` 的後端計算路徑要求,明確規範以 cache-SQL 為主、回應契約保持不變。
|
||||
- `reject-history-pareto-materialized-aggregate`: 調整 materialized miss/fallback 行為,要求優先落到 cache-SQL 計算路徑,而非 DataFrame 全表 regroup。
|
||||
- `reject-history-detail-export-parity`: 擴充匯出要求為串流輸出,同時維持與目前篩選條件一致的資料範圍與欄位語意。
|
||||
|
||||
## Impact
|
||||
|
||||
- Affected backend code:
|
||||
- `src/mes_dashboard/services/reject_dataset_cache.py`
|
||||
- `src/mes_dashboard/services/reject_pareto_materialized.py`
|
||||
- `src/mes_dashboard/core/query_spool_store.py`(讀取介面/metadata 支援 SQL runtime)
|
||||
- `src/mes_dashboard/routes/reject_history_routes.py`
|
||||
- `src/mes_dashboard/sql/reject_history/`(新增/調整 SQL 片段)
|
||||
- Affected tests:
|
||||
- `tests/test_reject_dataset_cache.py`
|
||||
- `tests/test_reject_history_routes.py`
|
||||
- `tests/test_reject_pareto_materialized.py`
|
||||
- 新增 cache-SQL runtime 與串流匯出測試
|
||||
- API surface:
|
||||
- 不新增 endpoint
|
||||
- 不變更既有參數與回應 schema(非 breaking)
|
||||
- Dependencies/infra:
|
||||
- 新增 DuckDB Python 依賴
|
||||
- 可能新增少量 SQL runtime 相關 env 開關(啟用、fallback、併發/記憶體上限)
|
||||
@@ -0,0 +1,68 @@
|
||||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: Reject History API SHALL provide batch Pareto endpoint with cross-filter
|
||||
The API SHALL provide a batch Pareto endpoint that returns all 6 dimension Pareto results in a single response, supporting cross-dimension filtering with exclude-self logic, and SHALL prefer materialized Pareto snapshots, then cache-SQL runtime, before considering legacy full-detail regrouping.
|
||||
|
||||
#### Scenario: Batch Pareto response structure
|
||||
- **WHEN** `GET /api/reject-history/batch-pareto` is called with valid `query_id`
|
||||
- **THEN** response SHALL be `{ success: true, data: { dimensions: { reason: {...}, package: {...}, type: {...}, workflow: {...}, workcenter: {...}, equipment: {...} } } }`
|
||||
- **THEN** each dimension object SHALL include `items` array with schema (`reason`, `metric_value`, `pct`, `cumPct`, `MOVEIN_QTY`, `REJECT_TOTAL_QTY`, `DEFECT_QTY`, `count`)
|
||||
|
||||
#### Scenario: Cross-filter exclude-self logic
|
||||
- **WHEN** `sel_reason=A&sel_type=X` is provided
|
||||
- **THEN** reason Pareto SHALL be computed with type=X filter applied (but NOT reason=A filter)
|
||||
- **THEN** type Pareto SHALL be computed with reason=A filter applied (but NOT type=X filter)
|
||||
- **THEN** package/workflow/workcenter/equipment Paretos SHALL be computed with both reason=A AND type=X filters applied
|
||||
|
||||
#### Scenario: Empty selections return unfiltered Paretos
|
||||
- **WHEN** batch-pareto is called with no `sel_*` parameters
|
||||
- **THEN** all 6 dimensions SHALL return their full Pareto distribution (subject to `pareto_scope`)
|
||||
|
||||
#### Scenario: Cache-only computation
|
||||
- **WHEN** `query_id` does not exist in cache
|
||||
- **THEN** the endpoint SHALL return HTTP 400 with error message indicating cache miss
|
||||
- **THEN** the endpoint SHALL NOT fall back to Oracle query
|
||||
|
||||
#### Scenario: Materialized snapshot preferred
|
||||
- **WHEN** a valid and fresh materialized Pareto snapshot exists for the request context
|
||||
- **THEN** the endpoint SHALL return results from that snapshot
|
||||
- **THEN** the endpoint SHALL avoid full lot-level regrouping for the same request
|
||||
|
||||
#### Scenario: Materialized miss fallback behavior
|
||||
- **WHEN** materialized snapshot is unavailable, stale, or build fails
|
||||
- **THEN** the endpoint SHALL fall back to cache-SQL computation before legacy DataFrame computation
|
||||
- **THEN** the response schema and filter semantics SHALL remain unchanged
|
||||
|
||||
#### Scenario: SQL fallback unavailable
|
||||
- **WHEN** cache-SQL runtime is disabled or unavailable under materialized miss
|
||||
- **THEN** the endpoint SHALL follow configured fallback policy deterministically
|
||||
- **THEN** the response metadata SHALL expose the fallback reason code
|
||||
|
||||
#### Scenario: Supplementary and policy filters apply
|
||||
- **WHEN** batch-pareto is called with supplementary filters (packages, workcenter_groups, reason) and policy toggles
|
||||
- **THEN** all 6 dimension Paretos SHALL be computed after applying policy and supplementary filters first (before cross-filter)
|
||||
|
||||
#### Scenario: Display scope (TOP20) support
|
||||
- **WHEN** `pareto_display_scope=top20` is provided
|
||||
- **THEN** applicable dimensions (type, workflow, equipment) SHALL truncate results to top 20 items after sorting
|
||||
- **WHEN** `pareto_display_scope` is omitted or `all`
|
||||
- **THEN** all items SHALL be returned (subject to `pareto_scope` filter)
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Reject History API SHALL provide SQL-first cache view derivation with schema parity
|
||||
The API SHALL derive cache-backed `view` responses through SQL-first runtime when enabled, while preserving existing response schema and filter behavior.
|
||||
|
||||
#### Scenario: View response contract preserved
|
||||
- **WHEN** `GET /api/reject-history/view` is called with valid `query_id`
|
||||
- **THEN** response payload SHALL keep existing top-level structure containing `analytics_raw`, `summary`, and `detail`
|
||||
- **THEN** pagination field names and types SHALL remain compatible with current frontend usage
|
||||
|
||||
#### Scenario: View SQL-first with deterministic fallback
|
||||
- **WHEN** SQL runtime is enabled for `view`
|
||||
- **THEN** summary/trend/detail derivation SHALL use SQL runtime as primary path
|
||||
- **THEN** fallback to legacy path SHALL follow configured policy and preserve response schema
|
||||
|
||||
#### Scenario: Cache-expired behavior unchanged
|
||||
- **WHEN** `query_id` cache has expired
|
||||
- **THEN** endpoint SHALL return the same cache-expired status behavior as current implementation
|
||||
@@ -0,0 +1,45 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Reject History cache-SQL runtime SHALL execute against cached datasets without full DataFrame materialization
|
||||
The system SHALL provide a SQL runtime for reject-history cached queries that reads from cache/spool data sources and avoids requiring full pandas DataFrame materialization as the primary execution path.
|
||||
|
||||
#### Scenario: Spool-backed execution
|
||||
- **WHEN** a valid `query_id` has parquet spool metadata available
|
||||
- **THEN** the runtime SHALL execute SQL directly against the spool dataset
|
||||
- **THEN** the request SHALL NOT require loading the entire dataset into a pandas DataFrame before filtering and aggregation
|
||||
|
||||
#### Scenario: Source resolution fallback
|
||||
- **WHEN** spool data is unavailable for a valid `query_id`
|
||||
- **THEN** the runtime SHALL follow a deterministic fallback order configured by system policy
|
||||
- **THEN** the fallback decision SHALL be observable via telemetry metadata
|
||||
|
||||
### Requirement: Reject History cache-SQL runtime SHALL preserve filter semantics across batch/view/export paths
|
||||
The runtime SHALL apply policy, supplementary, trend-date, and pareto selection filters with the same business semantics used by existing reject-history APIs.
|
||||
|
||||
#### Scenario: Batch pareto filter parity
|
||||
- **WHEN** `batch-pareto` is requested with policy toggles, supplementary filters, trend dates, and `sel_*` selections
|
||||
- **THEN** SQL runtime output SHALL preserve exclude-self cross-filter semantics for each dimension
|
||||
- **THEN** `pareto_scope=top80` and `pareto_display_scope=top20` behavior SHALL remain unchanged
|
||||
|
||||
#### Scenario: View filter parity
|
||||
- **WHEN** `view` is requested with `query_id` and active supplementary/interactive filters
|
||||
- **THEN** `summary`, `trend`, and paginated `detail` SHALL all reflect the same effective filter set
|
||||
- **THEN** response schema SHALL remain compatible with existing frontend contracts
|
||||
|
||||
#### Scenario: Export filter parity
|
||||
- **WHEN** `export-cached` is requested with the same filters as `view`
|
||||
- **THEN** exported rows SHALL represent the same filtered data scope as view/detail
|
||||
- **THEN** column naming and field semantics SHALL remain unchanged
|
||||
|
||||
### Requirement: Reject History cache-SQL runtime SHALL support controlled rollout and safe fallback
|
||||
The system SHALL expose runtime switches to enable or disable SQL execution per endpoint and SHALL support fallback to legacy computation when SQL runtime is unavailable.
|
||||
|
||||
#### Scenario: Endpoint-level enablement
|
||||
- **WHEN** SQL runtime is enabled only for `batch-pareto`
|
||||
- **THEN** `batch-pareto` SHALL use SQL runtime
|
||||
- **THEN** `view` and `export-cached` SHALL continue using legacy path until explicitly enabled
|
||||
|
||||
#### Scenario: SQL runtime fallback
|
||||
- **WHEN** SQL runtime encounters an execution failure for a request
|
||||
- **THEN** the system SHALL apply configured fallback behavior (legacy path or fail-fast)
|
||||
- **THEN** the response or metadata SHALL include a deterministic fallback reason code for operations troubleshooting
|
||||
@@ -0,0 +1,28 @@
|
||||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: Cached reject-history export SHALL support Pareto multi-select filter parity
|
||||
The cached export endpoint SHALL support Pareto multi-select context so that exported rows match the currently drilled-down detail scope, and SHALL stream response output to avoid requiring full in-memory row materialization before sending data.
|
||||
|
||||
#### Scenario: Apply selected Pareto dimension values
|
||||
- **WHEN** export request provides `pareto_dimension` and one or more `pareto_values`
|
||||
- **THEN** the backend SHALL apply an OR-match filter against the mapped dimension column
|
||||
- **THEN** only rows matching selected values SHALL be exported
|
||||
|
||||
#### Scenario: No Pareto selection keeps existing behavior
|
||||
- **WHEN** `pareto_values` is absent or empty
|
||||
- **THEN** export SHALL apply no extra Pareto-selected-item filter
|
||||
- **THEN** existing supplementary and interactive filters SHALL still apply
|
||||
|
||||
#### Scenario: Invalid Pareto dimension is rejected
|
||||
- **WHEN** `pareto_dimension` is not one of supported dimensions
|
||||
- **THEN** API SHALL return HTTP 400 with descriptive validation error
|
||||
|
||||
#### Scenario: Export response is streamed
|
||||
- **WHEN** cached export is requested for a large filtered dataset
|
||||
- **THEN** endpoint SHALL stream CSV rows incrementally to the client
|
||||
- **THEN** endpoint SHALL NOT require building a full rows list in memory before response begins
|
||||
|
||||
#### Scenario: Export scope matches view detail scope
|
||||
- **WHEN** `view` and `export-cached` are called with the same `query_id` and filter set
|
||||
- **THEN** exported rows SHALL represent the same filtered data scope as detail results
|
||||
- **THEN** display-only pareto truncation rules SHALL NOT remove rows from export output
|
||||
@@ -0,0 +1,19 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Materialized Pareto orchestration SHALL use cache-SQL fallback before legacy DataFrame regrouping
|
||||
When materialized snapshots are not available, orchestration SHALL prefer cache-SQL runtime to compute batch Pareto results before attempting legacy DataFrame regrouping.
|
||||
|
||||
#### Scenario: Materialized miss uses cache-SQL fallback
|
||||
- **WHEN** snapshot read misses, expires, or build fails for a batch-pareto request
|
||||
- **THEN** orchestration SHALL invoke cache-SQL batch pareto computation as the first fallback path
|
||||
- **THEN** returned payload SHALL preserve the same dimensions and item schema contract
|
||||
|
||||
#### Scenario: Cache-SQL unavailable fallback policy
|
||||
- **WHEN** cache-SQL fallback is disabled or unavailable after materialized miss
|
||||
- **THEN** orchestration SHALL apply configured fallback policy (legacy compute or fail-fast)
|
||||
- **THEN** fallback reason SHALL be recorded in metadata for diagnostics
|
||||
|
||||
#### Scenario: Fallback path preserves cross-filter semantics
|
||||
- **WHEN** cache-SQL fallback is used with multi-dimension `sel_*` filters
|
||||
- **THEN** exclude-self cross-filter semantics SHALL remain equivalent to materialized and legacy behavior
|
||||
- **THEN** `pareto_scope` and `pareto_display_scope` rules SHALL remain unchanged
|
||||
@@ -0,0 +1,34 @@
|
||||
## 1. SQL Runtime Foundation
|
||||
|
||||
- [x] 1.1 新增 reject-history cache-SQL runtime 模組(DuckDB 連線管理、來源解析、參數綁定 helper)
|
||||
- [x] 1.2 新增 parquet spool 優先讀取與來源 fallback 策略(含 deterministic fallback reason)
|
||||
- [x] 1.3 新增 runtime feature flags(全域與 endpoint 級開關)與預設值
|
||||
- [x] 1.4 補齊依賴設定(`requirements.txt` / `pyproject.toml` / `environment.yml`)與啟動相容性檢查
|
||||
|
||||
## 2. Batch Pareto SQL-first 路徑
|
||||
|
||||
- [x] 2.1 將 `batch-pareto` 在 materialized miss/stale/build-fail 時接入 cache-SQL 計算路徑
|
||||
- [x] 2.2 保留並驗證 exclude-self cross-filter、`top80`、`top20` 行為一致
|
||||
- [x] 2.3 實作 SQL 不可用時的 fallback policy(legacy 或 fail-fast)
|
||||
- [x] 2.4 補上 batch-pareto parity 測試(SQL vs legacy)與 fallback metadata 測試
|
||||
|
||||
## 3. View SQL 化
|
||||
|
||||
- [x] 3.1 以 SQL 重建 `summary` 與 `trend` 聚合計算(保持欄位與精度契約)
|
||||
- [x] 3.2 以 SQL 實作 detail 查詢、排序與分頁(含 policy/supplementary/trend/pareto selections)
|
||||
- [x] 3.3 將 `/api/reject-history/view` 切到 SQL-first 路徑並保留 schema 相容
|
||||
- [x] 3.4 補上 view parity 測試與 cache-expired 行為回歸測試
|
||||
|
||||
## 4. Export Cached 串流化
|
||||
|
||||
- [x] 4.1 將 `export-cached` 改為 generator/streaming CSV 輸出
|
||||
- [x] 4.2 確保 export 與 detail 使用同一套 filter 組合邏輯,維持 scope parity
|
||||
- [x] 4.3 移除全量 rows list / `to_dict` 依賴,避免匯出前全載入記憶體
|
||||
- [x] 4.4 補上大資料匯出測試(串流輸出、欄位契約、篩選一致性)
|
||||
|
||||
## 5. Observability, Guard, Rollout
|
||||
|
||||
- [x] 5.1 新增 SQL runtime telemetry(來源、fallback reason、耗時、列數)
|
||||
- [x] 5.2 保留既有 memory guards,調整 guard 觸發點與訊息以符合 SQL-first 流程
|
||||
- [x] 5.3 制定 rollout 策略(batch -> view -> export)與對應回退開關
|
||||
- [x] 5.4 更新操作文件與驗證清單(前端提示、匯出不受顯示限制影響、壓測項目)
|
||||
Reference in New Issue
Block a user