feat: dataset cache for hold/resource history + slow connection migration

Two changes combined: 1. historical-query-slow-connection: Migrate all historical query pages to read_sql_df_slow with semaphore concurrency control (max 3), raise DB slow timeout to 300s, gunicorn timeout to 360s, and unify frontend timeouts to 360s for all historical pages. 2. hold-resource-history-dataset-cache: Convert hold-history and resource-history from multi-query to single-query + dataset cache pattern (L1 ProcessLevelCache + L2 Redis parquet/base64, TTL=900s). Replace old GET endpoints with POST /query + GET /view two-phase API. Frontend auto-retries on 410 cache_expired. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 13:15:02 +08:00
parent cd061e0cfd
commit 71c8102de6
64 changed files with 3806 additions and 1442 deletions
--- a/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/.openspec.yaml
+++ b/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/.openspec.yaml
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-02-25
--- a/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/design.md
+++ b/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/design.md
@@ -0,0 +1,84 @@
+## Context
+
+Hold-history fires 4 independent Oracle queries per user interaction (trend, reason-pareto, duration, list), all against `DW_MES_HOLDRELEASEHISTORY` with the same date range + hold_type filter. Resource-history fires 4 Oracle queries (3 parallel in summary + 1 detail), all against `DW_MES_RESOURCESTATUS_SHIFT` with the same filter set. Both pages re-query Oracle on every filter change or pagination.
+
+Reject-history already solved this problem with a two-phase dataset cache pattern (`reject_dataset_cache.py`): one Oracle query caches the full fact set, subsequent views are derived from cache via pandas. This change applies the same pattern to hold-history and resource-history.
+
+## Goals / Non-Goals
+
+**Goals:**
+- Reduce Oracle queries from 4 per interaction to 1 per user session (per filter combination)
+- Same cache infrastructure as reject-history: L1 (ProcessLevelCache) + L2 (Redis parquet/base64), 15-minute TTL
+- Same API pattern: POST /query (primary) + GET /view (supplementary, from cache)
+- Maintain all existing UI functionality — same charts, tables, filters, pagination
+- Frontend adopts queryId-based two-phase flow
+
+**Non-Goals:**
+- Changing the SQL queries themselves (same table, same WHERE logic)
+- Adding new visualizations or metrics
+- Modifying other pages (reject-history, query-tool, etc.)
+- Changing the department endpoint on hold-history (it has unique person-level expansion logic that benefits from its own query — we keep it as a separate call)
+
+## Decisions
+
+### D1: Follow reject_dataset_cache.py architecture exactly
+
+**Decision**: Create `hold_dataset_cache.py` and `resource_dataset_cache.py` following the same module structure:
+- `_make_query_id()` — SHA256 hash of primary params
+- `_redis_store_df()` / `_redis_load_df()` — parquet/base64 encoding
+- `_get_cached_df()` / `_store_df()` — L1 → L2 read-through
+- `execute_primary_query()` — Oracle query + cache + derive initial view
+- `apply_view()` — read cache + filter + re-derive
+
+**Rationale**: Proven pattern, consistent codebase, shared infrastructure. Alternatives (custom cache format, separate cache layers) add complexity for no benefit.
+
+### D2: Hold-history primary query scope
+
+**Decision**: The primary query fetches ALL hold/release records for the date range (all hold_types). Trend, reason-pareto, duration, and list are all derived from this single cached DataFrame. Department remains a separate API call.
+
+**Rationale**: Trend data already contains all 3 hold_type variants in one query. By caching the raw facts (not pre-aggregated), we can switch hold_type views instantly from cache. Department has unique person-level JOINs and GROUP BY logic that doesn't fit the "filter from flat DataFrame" pattern cleanly.
+
+**Alternatives considered**:
+- Cache per hold_type: wastes 3x cache memory, still requires Oracle for type switching
+- Include department in cache: complex person-level aggregation doesn't map well to flat DataFrame filtering
+
+### D3: Resource-history primary query scope
+
+**Decision**: The primary query fetches ALL shift-status records for the date range and resource filter combination. KPI, trend, heatmap, workcenter comparison, and detail are all derived from this single cached DataFrame.
+
+**Rationale**: All 4 current queries (kpi, trend, heatmap, detail) use the same base WHERE clause against the same table. The aggregations (GROUP BY date for trend, GROUP BY workcenter×date for heatmap, etc.) are simple pandas operations on the cached raw data.
+
+### D4: Cache TTL = 15 minutes, same as reject-history
+
+**Decision**: Use `_CACHE_TTL = 900` (15 min) for both modules, with L1 `max_size = 8`.
+
+**Rationale**: Matches reject-history. 15 minutes covers typical analysis sessions. Users who need fresh data can re-query (which replaces the cache). Hold-history's existing 12h Redis cache for trend data is more aggressive but stale — 15 minutes is a better balance.
+
+### D5: API contract — POST /query + GET /view
+
+**Decision**: Both pages switch to:
+- `POST /api/hold-history/query` → primary query, returns `query_id` + initial view (trend, reason, duration, list page 1)
+- `GET /api/hold-history/view` → supplementary filter/pagination from cache
+- `POST /api/resource/history/query` → primary query, returns `query_id` + initial view (summary + detail page 1)
+- `GET /api/resource/history/view` → supplementary filter/pagination from cache
+
+Old GET endpoints (trend, reason-pareto, duration, list, summary, detail) are removed.
+
+**Rationale**: Same pattern as reject-history. POST for primary (sends filter params in body), GET for view (sends query_id + supplementary params).
+
+### D6: Frontend queryId-based flow
+
+**Decision**: Both `App.vue` files adopt the two-phase pattern:
+1. User clicks "查詢" → `POST /query` → store `queryId`
+2. Filter change / pagination → `GET /view?query_id=...&filters...` (no Oracle)
+3. Cache expired (HTTP 410) → auto re-execute primary query
+
+**Rationale**: Proven pattern from reject-history. Keeps UI responsive after initial query.
+
+## Risks / Trade-offs
+
+- **[Memory]** Caching full DataFrames in L1 (per-worker) uses more RAM than current approach → Mitigated by `max_size=8` LRU eviction (same as reject-history, works well in production)
+- **[Staleness]** 15-min TTL means data could be up to 15 minutes old during an analysis session → Acceptable for historical analysis; user can re-query for fresh data
+- **[Department endpoint]** Hold-history department still makes a separate Oracle call → Acceptable trade-off; person-level aggregation doesn't fit flat DataFrame model. Could be addressed later.
+- **[Breaking API]** Old GET endpoints removed → No external consumers; frontend is the only client
+- **[Redis dependency]** If Redis is down, only L1 cache works (per-worker, not cross-worker) → Same behavior as reject-history; L1 still provides 15-min cache per worker
--- a/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/proposal.md
+++ b/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/proposal.md
@@ -0,0 +1,31 @@
+## Why
+
+Hold-history and resource-history pages currently fire 4 separate Oracle queries per user interaction (filter change, pagination, refresh), all hitting the same base table with identical filter parameters. This wastes Oracle connections and creates unnecessary latency — especially now that these pages use `read_sql_df_slow` (dedicated connections with 300s timeout). The reject-history page already solves this with a "single query + cache derivation" pattern that reduces Oracle load by ~75%. Hold-history and resource-history should adopt the same architecture.
+
+## What Changes
+
+- **New `hold_dataset_cache.py`**: Two-phase cache module for hold-history. Single Oracle query caches the full hold/release fact set; subsequent views (trend, reason pareto, duration distribution, paginated list) are derived from cache using pandas.
+- **New `resource_dataset_cache.py`**: Two-phase cache module for resource-history. Single Oracle query caches the full shift-status fact set; subsequent views (KPI, trend, heatmap, workcenter comparison, paginated detail) are derived from cache using pandas.
+- **Hold-history route rewrite**: Replace 4 independent GET endpoints with POST /query (primary) + GET /view (supplementary) pattern.
+- **Resource-history route rewrite**: Replace GET /summary (3 parallel queries) + GET /detail (1 query) with POST /query + GET /view pattern.
+- **Frontend two-phase flow**: Both pages adopt queryId-based flow — primary query returns queryId + initial view; filter/pagination changes call GET /view with queryId (no Oracle).
+- **Cache infrastructure**: L1 (ProcessLevelCache, in-process) + L2 (Redis, parquet/base64), 15-minute TTL, deterministic query ID from SHA256 of primary params. Same architecture as `reject_dataset_cache.py`.
+
+## Capabilities
+
+### New Capabilities
+- `hold-dataset-cache`: Two-phase dataset cache for hold-history (single Oracle query + in-memory derivation for trend, reason pareto, duration, paginated list)
+- `resource-dataset-cache`: Two-phase dataset cache for resource-history (single Oracle query + in-memory derivation for KPI, trend, heatmap, comparison, paginated detail)
+
+### Modified Capabilities
+- `hold-history-api`: Route endpoints change from 4 independent GETs to POST /query + GET /view
+- `hold-history-page`: Frontend adopts two-phase queryId flow
+- `resource-history-page`: Frontend adopts two-phase queryId flow; route endpoints consolidated
+
+## Impact
+
+- **Backend**: New files `hold_dataset_cache.py`, `resource_dataset_cache.py`; modified routes for both pages; service functions remain but are called only once per primary query
+- **Frontend**: `hold-history/App.vue` and `resource-history/App.vue` rewritten for two-phase flow
+- **Oracle load**: ~75% reduction per page (4 queries → 1 per user session, subsequent interactions from cache)
+- **Redis**: Additional cache entries (~2 namespaces, same TTL/encoding as reject_dataset)
+- **API contract**: Endpoint signatures change (breaking for these 2 pages, but no external consumers)
--- a/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/specs/hold-dataset-cache/spec.md
+++ b/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/specs/hold-dataset-cache/spec.md
@@ -0,0 +1,64 @@
+## ADDED Requirements
+
+### Requirement: Hold dataset cache SHALL execute a single Oracle query and cache the result
+The hold_dataset_cache module SHALL query Oracle once for the full hold/release fact set and cache it for subsequent derivations.
+
+#### Scenario: Primary query execution and caching
+- **WHEN** `execute_primary_query()` is called with date range and hold_type parameters
+- **THEN** a deterministic `query_id` SHALL be computed from the primary params (start_date, end_date) using SHA256
+- **THEN** if a cached DataFrame exists for this query_id (L1 or L2), it SHALL be used without querying Oracle
+- **THEN** if no cache exists, a single Oracle query SHALL fetch all hold/release records from `DW_MES_HOLDRELEASEHISTORY` for the date range (all hold_types)
+- **THEN** the result DataFrame SHALL be stored in both L1 (ProcessLevelCache) and L2 (Redis as parquet/base64)
+- **THEN** the response SHALL include `query_id`, trend, reason_pareto, duration, and list page 1
+
+#### Scenario: Cache TTL and eviction
+- **WHEN** a DataFrame is cached
+- **THEN** the cache TTL SHALL be 900 seconds (15 minutes)
+- **THEN** L1 cache max_size SHALL be 8 entries with LRU eviction
+- **THEN** the Redis namespace SHALL be `hold_dataset`
+
+### Requirement: Hold dataset cache SHALL derive trend data from cached DataFrame
+The module SHALL compute daily trend aggregations from the cached fact set.
+
+#### Scenario: Trend derivation from cache
+- **WHEN** `apply_view()` is called with a valid query_id
+- **THEN** trend data SHALL be derived by grouping the cached DataFrame by date
+- **THEN** the 07:30 shift boundary rule SHALL be applied
+- **THEN** all three hold_type variants (quality, non_quality, all) SHALL be computed from the same DataFrame
+- **THEN** hold_type filtering SHALL be applied in-memory without re-querying Oracle
+
+### Requirement: Hold dataset cache SHALL derive reason Pareto from cached DataFrame
+The module SHALL compute reason distribution from the cached fact set.
+
+#### Scenario: Reason Pareto derivation
+- **WHEN** `apply_view()` is called with hold_type filter
+- **THEN** reason Pareto SHALL be derived by grouping the filtered DataFrame by HOLDREASONNAME
+- **THEN** items SHALL include count, qty, pct, and cumPct
+- **THEN** items SHALL be sorted by count descending
+
+### Requirement: Hold dataset cache SHALL derive duration distribution from cached DataFrame
+The module SHALL compute hold duration buckets from the cached fact set.
+
+#### Scenario: Duration derivation
+- **WHEN** `apply_view()` is called with hold_type filter
+- **THEN** duration distribution SHALL be derived from records where RELEASETXNDATE IS NOT NULL
+- **THEN** 4 buckets SHALL be computed: <4h, 4-24h, 1-3d, >3d
+- **THEN** each bucket SHALL include count and pct
+
+### Requirement: Hold dataset cache SHALL derive paginated list from cached DataFrame
+The module SHALL provide paginated detail records from the cached fact set.
+
+#### Scenario: List pagination from cache
+- **WHEN** `apply_view()` is called with page and per_page parameters
+- **THEN** the cached DataFrame SHALL be filtered by hold_type and optional reason filter
+- **THEN** records SHALL be sorted by HOLDTXNDATE descending
+- **THEN** pagination SHALL be applied in-memory (offset + limit on the sorted DataFrame)
+- **THEN** response SHALL include items and pagination metadata (page, perPage, total, totalPages)
+
+### Requirement: Hold dataset cache SHALL handle cache expiry gracefully
+The module SHALL return appropriate signals when cache has expired.
+
+#### Scenario: Cache expired during view request
+- **WHEN** `apply_view()` is called with a query_id whose cache has expired
+- **THEN** the response SHALL return `{ success: false, error: "cache_expired" }`
+- **THEN** the HTTP status SHALL be 410 (Gone)
--- a/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/specs/hold-history-api/spec.md
+++ b/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/specs/hold-history-api/spec.md
@@ -0,0 +1,62 @@
+## MODIFIED Requirements
+
+### Requirement: Hold History API SHALL provide daily trend data with Redis caching
+The Hold History API SHALL return trend, reason-pareto, duration, and list data from a single cached dataset via a two-phase query pattern (POST /query + GET /view). The old independent GET endpoints for trend, reason-pareto, duration, and list SHALL be replaced.
+
+#### Scenario: Primary query endpoint
+- **WHEN** `POST /api/hold-history/query` is called with `{ start_date, end_date, hold_type }`
+- **THEN** the service SHALL execute a single Oracle query (or read from cache) via `hold_dataset_cache.execute_primary_query()`
+- **THEN** the response SHALL return `{ success: true, data: { query_id, trend, reason_pareto, duration, list, summary } }`
+- **THEN** list SHALL contain page 1 with default per_page of 50
+
+#### Scenario: Supplementary view endpoint
+- **WHEN** `GET /api/hold-history/view?query_id=...&hold_type=...&reason=...&page=...&per_page=...` is called
+- **THEN** the service SHALL read the cached DataFrame and derive filtered views via `hold_dataset_cache.apply_view()`
+- **THEN** no Oracle query SHALL be executed
+- **THEN** the response SHALL return `{ success: true, data: { trend, reason_pareto, duration, list } }`
+
+#### Scenario: Cache expired on view request
+- **WHEN** GET /view is called with an expired query_id
+- **THEN** the response SHALL return `{ success: false, error: "cache_expired" }` with HTTP 410
+
+#### Scenario: Trend uses shift boundary at 07:30
+- **WHEN** daily aggregation is calculated
+- **THEN** transactions with time >= 07:30 SHALL be attributed to the next calendar day
+- **THEN** transactions with time < 07:30 SHALL be attributed to the current calendar day
+
+#### Scenario: Trend hold type classification
+- **WHEN** trend data is aggregated by hold type
+- **THEN** quality classification SHALL use the same NON_QUALITY_HOLD_REASONS set as existing hold endpoints
+
+### Requirement: Hold History API SHALL provide reason Pareto data
+The reason Pareto data SHALL be derived from the cached dataset, not from a separate Oracle query.
+
+#### Scenario: Reason Pareto from cache
+- **WHEN** reason Pareto is requested via GET /view with hold_type filter
+- **THEN** the cached DataFrame SHALL be filtered by hold_type and grouped by HOLDREASONNAME
+- **THEN** each item SHALL contain `{ reason, count, qty, pct, cumPct }`
+- **THEN** items SHALL be sorted by count descending
+
+### Requirement: Hold History API SHALL provide hold duration distribution
+The duration distribution SHALL be derived from the cached dataset.
+
+#### Scenario: Duration from cache
+- **WHEN** duration is requested via GET /view
+- **THEN** the cached DataFrame SHALL be filtered to released holds only
+- **THEN** 4 buckets SHALL be computed: <4h, 4-24h, 1-3d, >3d
+
+### Requirement: Hold History API SHALL provide paginated detail list
+The detail list SHALL be paginated from the cached dataset.
+
+#### Scenario: List pagination from cache
+- **WHEN** list is requested via GET /view with page and per_page params
+- **THEN** the cached DataFrame SHALL be filtered and paginated in-memory
+- **THEN** response SHALL include items and pagination metadata
+
+### Requirement: Hold History API SHALL keep department endpoint as separate query
+The department endpoint SHALL remain as a separate Oracle query due to its unique person-level aggregation.
+
+#### Scenario: Department endpoint unchanged
+- **WHEN** `GET /api/hold-history/department` is called
+- **THEN** it SHALL continue to execute its own Oracle query
+- **THEN** it SHALL NOT use the dataset cache
--- a/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/specs/hold-history-page/spec.md
+++ b/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/specs/hold-history-page/spec.md
@@ -0,0 +1,34 @@
+## MODIFIED Requirements
+
+### Requirement: Hold History page SHALL display a filter bar with date range and hold type
+The page SHALL provide a filter bar for selecting date range and hold type classification. On query, the page SHALL use a two-phase flow: POST /query returns queryId, subsequent filter changes use GET /view.
+
+#### Scenario: Primary query via POST /query
+- **WHEN** user clicks the query button (or page loads with default filters)
+- **THEN** the page SHALL call `POST /api/hold-history/query` with `{ start_date, end_date, hold_type }`
+- **THEN** the response queryId SHALL be stored for subsequent view requests
+- **THEN** trend, reason-pareto, duration, and list SHALL all be populated from the single response
+
+#### Scenario: Hold type or reason filter change uses GET /view
+- **WHEN** user changes hold_type radio or clicks a reason in the Pareto chart (while queryId exists)
+- **THEN** the page SHALL call `GET /api/hold-history/view?query_id=...&hold_type=...&reason=...`
+- **THEN** no new Oracle query SHALL be triggered
+- **THEN** trend, reason-pareto, duration, and list SHALL update from the view response
+
+#### Scenario: Pagination uses GET /view
+- **WHEN** user navigates to a different page in the detail list
+- **THEN** the page SHALL call `GET /api/hold-history/view?query_id=...&page=...&per_page=...`
+
+#### Scenario: Date range change triggers new primary query
+- **WHEN** user changes the date range and clicks query
+- **THEN** the page SHALL call `POST /api/hold-history/query` with new dates
+- **THEN** a new queryId SHALL replace the old one
+
+#### Scenario: Cache expired auto-retry
+- **WHEN** GET /view returns `{ success: false, error: "cache_expired" }`
+- **THEN** the page SHALL automatically re-execute `POST /api/hold-history/query` with the last committed filters
+- **THEN** the view SHALL refresh with the new data
+
+#### Scenario: Department still uses separate API
+- **WHEN** department data needs to load or reload
+- **THEN** the page SHALL call `GET /api/hold-history/department` separately
--- a/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/specs/resource-dataset-cache/spec.md
+++ b/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/specs/resource-dataset-cache/spec.md
@@ -0,0 +1,71 @@
+## ADDED Requirements
+
+### Requirement: Resource dataset cache SHALL execute a single Oracle query and cache the result
+The resource_dataset_cache module SHALL query Oracle once for the full shift-status fact set and cache it for subsequent derivations.
+
+#### Scenario: Primary query execution and caching
+- **WHEN** `execute_primary_query()` is called with date range, granularity, and resource filter parameters
+- **THEN** a deterministic `query_id` SHALL be computed from all primary params using SHA256
+- **THEN** if a cached DataFrame exists for this query_id (L1 or L2), it SHALL be used without querying Oracle
+- **THEN** if no cache exists, a single Oracle query SHALL fetch all shift-status records from `DW_MES_RESOURCESTATUS_SHIFT` for the filtered resources and date range
+- **THEN** the result DataFrame SHALL be stored in both L1 (ProcessLevelCache) and L2 (Redis as parquet/base64)
+- **THEN** the response SHALL include `query_id`, summary (KPI, trend, heatmap, comparison), and detail page 1
+
+#### Scenario: Cache TTL and eviction
+- **WHEN** a DataFrame is cached
+- **THEN** the cache TTL SHALL be 900 seconds (15 minutes)
+- **THEN** L1 cache max_size SHALL be 8 entries with LRU eviction
+- **THEN** the Redis namespace SHALL be `resource_dataset`
+
+### Requirement: Resource dataset cache SHALL derive KPI summary from cached DataFrame
+The module SHALL compute aggregated KPI metrics from the cached fact set.
+
+#### Scenario: KPI derivation from cache
+- **WHEN** summary view is derived from cached DataFrame
+- **THEN** total hours for PRD, SBY, UDT, SDT, EGT, NST SHALL be summed
+- **THEN** OU% and AVAIL% SHALL be computed from the hour totals
+- **THEN** machine count SHALL be the distinct count of HISTORYID in the cached data
+
+### Requirement: Resource dataset cache SHALL derive trend data from cached DataFrame
+The module SHALL compute time-series aggregations from the cached fact set.
+
+#### Scenario: Trend derivation
+- **WHEN** summary view is derived with a given granularity (day/week/month/year)
+- **THEN** the cached DataFrame SHALL be grouped by the granularity period
+- **THEN** each period SHALL include PRD, SBY, UDT, SDT, EGT, NST hours and computed OU%, AVAIL%
+
+### Requirement: Resource dataset cache SHALL derive heatmap from cached DataFrame
+The module SHALL compute workcenter × date OU% matrix from the cached fact set.
+
+#### Scenario: Heatmap derivation
+- **WHEN** summary view is derived
+- **THEN** the cached DataFrame SHALL be grouped by (workcenter, date)
+- **THEN** each cell SHALL contain the OU% for that workcenter on that date
+- **THEN** workcenters SHALL be sorted by workcenter_seq
+
+### Requirement: Resource dataset cache SHALL derive workcenter comparison from cached DataFrame
+The module SHALL compute per-workcenter aggregated metrics from the cached fact set.
+
+#### Scenario: Comparison derivation
+- **WHEN** summary view is derived
+- **THEN** the cached DataFrame SHALL be grouped by workcenter
+- **THEN** each workcenter SHALL include total hours and computed OU%
+- **THEN** results SHALL be sorted by OU% descending, limited to top 15
+
+### Requirement: Resource dataset cache SHALL derive paginated detail from cached DataFrame
+The module SHALL provide hierarchical detail records from the cached fact set.
+
+#### Scenario: Detail derivation and pagination
+- **WHEN** detail view is requested with page and per_page parameters
+- **THEN** the cached DataFrame SHALL be used to compute per-resource metrics
+- **THEN** resource dimension data (WORKCENTERNAME, RESOURCEFAMILYNAME) SHALL be merged from resource_cache
+- **THEN** results SHALL be structured as a hierarchical tree (workcenter → family → resource)
+- **THEN** pagination SHALL apply to the flattened list
+
+### Requirement: Resource dataset cache SHALL handle cache expiry gracefully
+The module SHALL return appropriate signals when cache has expired.
+
+#### Scenario: Cache expired during view request
+- **WHEN** a view is requested with a query_id whose cache has expired
+- **THEN** the response SHALL return `{ success: false, error: "cache_expired" }`
+- **THEN** the HTTP status SHALL be 410 (Gone)
--- a/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/specs/resource-history-page/spec.md
+++ b/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/specs/resource-history-page/spec.md
@@ -0,0 +1,46 @@
+## MODIFIED Requirements
+
+### Requirement: Resource History page SHALL support date range and granularity selection
+The page SHALL allow users to specify time range and aggregation granularity. On query, the page SHALL use a two-phase flow: POST /query returns queryId, subsequent filter changes use GET /view.
+
+#### Scenario: Primary query via POST /query
+- **WHEN** user clicks the query button
+- **THEN** the page SHALL call `POST /api/resource/history/query` with date range, granularity, and resource filters
+- **THEN** the response queryId SHALL be stored for subsequent view requests
+- **THEN** summary (KPI, trend, heatmap, comparison) and detail page 1 SHALL all be populated from the single response
+
+#### Scenario: Filter change uses GET /view
+- **WHEN** user changes supplementary filters (workcenter groups, families, machines, equipment type) while queryId exists
+- **THEN** the page SHALL call `GET /api/resource/history/view?query_id=...&filters...`
+- **THEN** no new Oracle query SHALL be triggered
+- **THEN** all charts, KPI cards, and detail table SHALL update from the view response
+
+#### Scenario: Pagination uses GET /view
+- **WHEN** user navigates to a different page in the detail table
+- **THEN** the page SHALL call `GET /api/resource/history/view?query_id=...&page=...`
+
+#### Scenario: Date range or granularity change triggers new primary query
+- **WHEN** user changes date range or granularity and clicks query
+- **THEN** the page SHALL call `POST /api/resource/history/query` with new params
+- **THEN** a new queryId SHALL replace the old one
+
+#### Scenario: Cache expired auto-retry
+- **WHEN** GET /view returns `{ success: false, error: "cache_expired" }`
+- **THEN** the page SHALL automatically re-execute `POST /api/resource/history/query` with the last committed filters
+- **THEN** the view SHALL refresh with the new data
+
+### Requirement: Resource History page SHALL display KPI summary cards
+The page SHALL show 9 KPI cards with aggregated performance metrics derived from the cached dataset.
+
+#### Scenario: KPI cards from cached data
+- **WHEN** summary data is derived from the cached DataFrame
+- **THEN** 9 cards SHALL display: OU%, AVAIL%, PRD, SBY, UDT, SDT, EGT, NST, Machine Count
+- **THEN** values SHALL be computed from the cached shift-status records, not from a separate Oracle query
+
+### Requirement: Resource History page SHALL display hierarchical detail table
+The page SHALL show a three-level expandable table derived from the cached dataset.
+
+#### Scenario: Detail table from cached data
+- **WHEN** detail data is derived from the cached DataFrame
+- **THEN** a tree table SHALL display with the same columns and hierarchy as before
+- **THEN** data SHALL be derived in-memory from the cached DataFrame, not from a separate Oracle query
--- a/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/tasks.md
+++ b/openspec/changes/archive/2026-02-25-hold-resource-history-dataset-cache/tasks.md
@@ -0,0 +1,42 @@
+## 1. Hold-History Dataset Cache (Backend)
+
+- [x] 1.1 Create `src/mes_dashboard/services/hold_dataset_cache.py` — module scaffolding: imports, logger, ProcessLevelCache (TTL=900, max_size=8), Redis namespace `hold_dataset`, `_make_query_id()`, `_redis_store_df()` / `_redis_load_df()`, `_get_cached_df()` / `_store_df()`
+- [x] 1.2 Implement `execute_primary_query(start_date, end_date)` — single Oracle query fetching ALL hold/release facts for date range (all hold_types), cache result, derive initial view (trend, reason_pareto, duration, list page 1) using existing service functions
+- [x] 1.3 Implement `apply_view(query_id, hold_type, reason, page, per_page)` — read cached DF, apply hold_type filter, derive trend + reason_pareto + duration + paginated list from filtered DF; return 410 on cache miss
+- [x] 1.4 Implement in-memory derivation helpers: `_derive_trend(df, hold_type)`, `_derive_reason_pareto(df, hold_type)`, `_derive_duration(df, hold_type)`, `_derive_list(df, hold_type, reason, page, per_page)` — reuse shift boundary and hold_type classification logic from `hold_history_service.py`
+
+## 2. Hold-History Routes (Backend)
+
+- [x] 2.1 Add `POST /api/hold-history/query` route — parse body `{ start_date, end_date, hold_type }`, call `hold_dataset_cache.execute_primary_query()`, return `{ query_id, trend, reason_pareto, duration, list, summary }`
+- [x] 2.2 Add `GET /api/hold-history/view` route — parse query params `query_id, hold_type, reason, page, per_page`, call `hold_dataset_cache.apply_view()`, return derived views or 410 on cache miss
+- [x] 2.3 Remove old GET endpoints: `/api/hold-history/trend`, `/api/hold-history/reason-pareto`, `/api/hold-history/duration`, `/api/hold-history/list` — keep page route and department route (if exists)
+
+## 3. Hold-History Frontend
+
+- [x] 3.1 Rewrite `frontend/src/hold-history/App.vue` — two-phase flow: initial load calls `POST /query` → store queryId; hold_type change, reason filter, pagination call `GET /view?query_id=...`; cache expired (410) → auto re-execute primary query
+- [x] 3.2 Derive summary KPI cards from trend data returned by query/view response (no separate API call)
+- [x] 3.3 Update all chart components to consume data from the unified query/view response instead of individual API results
+
+## 4. Resource-History Dataset Cache (Backend)
+
+- [x] 4.1 Create `src/mes_dashboard/services/resource_dataset_cache.py` — module scaffolding: same cache infrastructure (TTL=900, max_size=8), Redis namespace `resource_dataset`, query ID helpers, L1+L2 cache read/write
+- [x] 4.2 Implement `execute_primary_query(params)` — single Oracle query fetching ALL shift-status records for date range + resource filters, cache result, derive initial view (summary: kpi + trend + heatmap + comparison, detail page 1) using existing service functions
+- [x] 4.3 Implement `apply_view(query_id, granularity, page, per_page)` — read cached DF, derive summary + paginated detail; return 410 on cache miss
+- [x] 4.4 Implement in-memory derivation helpers: `_derive_kpi(df)`, `_derive_trend(df, granularity)`, `_derive_heatmap(df)`, `_derive_comparison(df)`, `_derive_detail(df, page, per_page)` — reuse aggregation logic from `resource_history_service.py`
+
+## 5. Resource-History Routes (Backend)
+
+- [x] 5.1 Add `POST /api/resource/history/query` route — parse body with date range, granularity, resource filters, call `resource_dataset_cache.execute_primary_query()`, return `{ query_id, summary, detail }`
+- [x] 5.2 Add `GET /api/resource/history/view` route — parse query params `query_id, granularity, page, per_page`, call `resource_dataset_cache.apply_view()`, return derived views or 410
+- [x] 5.3 Remove old GET endpoints: `/api/resource/history/summary`, `/api/resource/history/detail` — keep `/options` and `/export` endpoints
+
+## 6. Resource-History Frontend
+
+- [x] 6.1 Rewrite `frontend/src/resource-history/App.vue` — two-phase flow: query button calls `POST /query` → store queryId; filter changes call `GET /view?query_id=...`; cache expired → auto re-execute
+- [x] 6.2 Update `executeCommittedQuery()` to use POST /query instead of parallel GET summary + GET detail
+- [x] 6.3 Update all chart/table components to consume data from unified query/view response
+
+## 7. Verification
+
+- [x] 7.1 Run `python -m pytest tests/ -v` — no new test failures
+- [x] 7.2 Run `cd frontend && npm run build` — frontend builds successfully
--- a/openspec/changes/historical-query-slow-connection/.openspec.yaml
+++ b/openspec/changes/historical-query-slow-connection/.openspec.yaml
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-02-25
--- a/openspec/changes/historical-query-slow-connection/design.md
+++ b/openspec/changes/historical-query-slow-connection/design.md
@@ -0,0 +1,78 @@
+## Context
+
+歷史查詢頁面（reject-history、hold-history、resource-history、job-query、excel-query）
+目前使用 `read_sql_df`（connection pool，55s `call_timeout`），大範圍查詢容易 timeout。
+前端 AbortController timeout 為 60~120s。Gunicorn worker timeout 為 130s。
+
+現有 `read_sql_df_slow` 函式（`database.py:573`）已提供獨立連線路徑，
+但只被 `query_tool_service.py` 的 `full_history` 模式使用，且 timeout 寫死 120s，
+無並行控制。
+
+## Goals / Non-Goals
+
+**Goals:**
+- 歷史查詢 service 全面遷移到 `read_sql_df_slow`（獨立連線，不佔用 pool）
+- `read_sql_df_slow` 的 timeout 從寫死 120s 改為 config 驅動（預設 300s）
+- 加入 global semaphore 限制並行 slow query 數量，保護 Oracle 不被大量連線壓垮
+- Gunicorn 和前端 timeout 配合調整，確保查詢不會在任何層被過早截斷
+- 即時監控頁完全不受影響
+
+**Non-Goals:**
+- 不改動 `read_sql_df` 本身（pool path 保持 55s）
+- 不改動 EventFetcher、LineageEngine（小查詢，走 pool 即可）
+- 不引入非同步任務佇列（Celery 等）
+- 不對即時監控頁做任何修改
+- 不修改 Oracle SQL 本身（查詢優化不在此範圍）
+
+## Decisions
+
+### D1: Import alias 遷移模式
+
+**決策**：各 service 用 `from ... import read_sql_df_slow as read_sql_df` 取代原本的 `read_sql_df` import。
+
+**理由**：兩個函式的 `(sql, params)` 簽名相容，alias 後所有 call site 零改動。
+比在 `read_sql_df` 加 flag 更乾淨——不污染 pool path 的邏輯。
+
+**替代方案**：在 `read_sql_df` 加 `slow=True` 參數 → 增加 pool path 複雜度，rejected。
+
+### D2: Semaphore 限制並行數
+
+**決策**：在 `database.py` 加 module-level `threading.Semaphore`，`read_sql_df_slow` 執行前 acquire，finally release。預設 3 並行（可由 `DB_SLOW_MAX_CONCURRENT` 環境變數調整）。
+
+**理由**：Gunicorn gthread 模式 2 workers × 4 threads = 8 request threads。
+限制 3 個 slow 連線確保至少 5 個 threads 仍可服務即時頁。
+Oracle 端不會同時看到超過 3 個長查詢連線。
+
+**Semaphore acquire timeout**：60 秒。超時回傳明確錯誤「查詢繁忙，請稍後再試」。
+
+### D3: Timeout 數值選擇
+
+| 層 | 修改前 | 修改後 | 理由 |
+|---|---|---|---|
+| `read_sql_df_slow` | 120s 寫死 | 300s config | 5 分鐘足夠大多數歷史查詢 |
+| Gunicorn worker | 130s | 360s | 300s + 60s overhead |
+| 前端 historical | 60~120s | 360s | 與 Gunicorn 對齊 |
+
+### D4: excel_query_service 特殊處理
+
+**決策**：excel_query_service 不用 `read_sql_df`，而是直接用 `get_db_connection()` + cursor。
+改為在取得連線後設定 `connection.call_timeout = slow_call_timeout_ms`。
+
+**理由**：保持現有 cursor batch 邏輯不變，只延長 timeout。
+
+### D5: resource_history_service 並行查詢
+
+**決策**：`query_summary` 用 `ThreadPoolExecutor(max_workers=3)` 並行 3 條查詢。
+遷移後每條查詢佔 1 個 semaphore slot（共 3 個），單次請求可能佔滿所有 slot。
+
+**接受此風險**：3 條查詢並行完成很快，slot 很快釋放。其他 slow 請求最多等 60s。
+
+## Risks / Trade-offs
+
+| 風險 | 緩解措施 |
+|------|---------|
+| Semaphore deadlock（exception 未 release） | `finally` block 保證 release |
+| 3 並行 slot 不夠用（多人同時查歷史） | `DB_SLOW_MAX_CONCURRENT` 可動態調整，無需改 code |
+| 長查詢佔住 Gunicorn thread 影響即時頁 | Semaphore 限制最多 3 個 thread 被佔用，其餘 5 個可用 |
+| Circuit breaker 不再保護歷史查詢 | 歷史查詢為使用者手動觸發、非自動化，可接受 |
+| `resource_history_service` 一次用完 3 slot | 查詢快速完成，slot 迅速釋放；可視需要降低 max_workers |
--- a/openspec/changes/historical-query-slow-connection/proposal.md
+++ b/openspec/changes/historical-query-slow-connection/proposal.md
@@ -0,0 +1,42 @@
+## Why
+
+歷史查詢頁面（reject-history、query-tool、hold-history、resource-history、job-query、excel-query、mid-section-defect）
+目前共用 connection pool 的 55s `call_timeout`，前端也設有 60~120s AbortController timeout。
+大範圍查詢（如長日期區間或大量 LOT）經常逼近或超過 timeout，使用者看到 "signal is aborted without reason" 錯誤。
+
+歷史查詢屬於使用者手動觸發、非即時、可等待的操作，應採用獨立連線（`read_sql_df_slow`）配合
+semaphore 並行控制，以「用時間換結果」的方式完成查詢，同時保護 connection pool 不被耗盡。
+
+## What Changes
+
+- 歷史查詢 service 從 `read_sql_df`（pool, 55s）遷移到 `read_sql_df_slow`（獨立連線, 300s）
+- `read_sql_df_slow` 加入 global semaphore 限制並行數（預設 3），避免耗盡 Oracle 連線
+- `read_sql_df_slow` 的 timeout 預設從寫死 120s 改為 config 驅動（預設 300s）
+- Gunicorn worker timeout 從 130s 提升到 360s 以容納長查詢
+- 前端歷史頁 timeout 統一提升到 360s（6 分鐘）
+- 新增 `DB_SLOW_CALL_TIMEOUT_MS` 和 `DB_SLOW_MAX_CONCURRENT` 環境變數設定
+- 即時監控頁（wip、hold-overview、resource-status 等）完全不受影響
+
+## Capabilities
+
+### New Capabilities
+
+- `slow-query-concurrency-control`: `read_sql_df_slow` 的 semaphore 並行控制與 config 驅動 timeout
+
+### Modified Capabilities
+
+- `reject-history-api`: 底層 DB 查詢從 pooled 改為 dedicated slow connection
+- `hold-history-api`: 底層 DB 查詢從 pooled 改為 dedicated slow connection
+- `query-tool-lot-trace`: 移除 `read_sql_df_slow` 寫死的 120s timeout，改用 config 預設
+- `reject-history-page`: 前端 API_TIMEOUT 從 60s 提升到 360s
+- `hold-history-page`: 前端 API_TIMEOUT 從 60s 提升到 360s
+- `resource-history-page`: 前端 API_TIMEOUT 從 60s 提升到 360s，後端遷移至 slow connection
+- `query-tool-equipment`: 前端 timeout 從 120s 提升到 360s
+- `progressive-trace-ux`: DEFAULT_STAGE_TIMEOUT_MS 從 60s 提升到 360s
+
+## Impact
+
+- **後端 services**：reject_history_service、reject_dataset_cache、hold_history_service、resource_history_service、job_query_service、excel_query_service、query_tool_service
+- **核心模組**：database.py（semaphore + config）、settings.py（新設定）、gunicorn.conf.py（timeout）
+- **前端頁面**：reject-history、mid-section-defect、hold-history、resource-history、query-tool（5 個 composable + 1 component）、job-query、excel-query
+- **不影響**：即時監控頁（wip-overview、wip-detail、hold-overview、hold-detail、resource-status、admin-performance）
--- a/openspec/changes/historical-query-slow-connection/specs/hold-history-api/spec.md
+++ b/openspec/changes/historical-query-slow-connection/specs/hold-history-api/spec.md
@@ -0,0 +1,9 @@
+## MODIFIED Requirements
+
+### Requirement: Database query execution path
+The hold-history service (`hold_history_service.py`) SHALL use `read_sql_df_slow` (dedicated connection) instead of `read_sql_df` (pooled connection) for all Oracle queries.
+
+#### Scenario: Hold history queries use dedicated connection
+- **WHEN** any hold-history query is executed (trend, pareto, duration, list)
+- **THEN** it uses `read_sql_df_slow` which creates a dedicated Oracle connection outside the pool
+- **AND** the connection has a 300-second call_timeout (configurable)
--- a/openspec/changes/historical-query-slow-connection/specs/hold-history-page/spec.md
+++ b/openspec/changes/historical-query-slow-connection/specs/hold-history-page/spec.md
@@ -0,0 +1,8 @@
+## MODIFIED Requirements
+
+### Requirement: Frontend API timeout
+The hold-history page SHALL use a 360-second API timeout (up from 60 seconds) for all Oracle-backed API calls.
+
+#### Scenario: Large date range query completes
+- **WHEN** a user queries hold history for a long date range
+- **THEN** the frontend does not abort the request for at least 360 seconds
--- a/openspec/changes/historical-query-slow-connection/specs/progressive-trace-ux/spec.md
+++ b/openspec/changes/historical-query-slow-connection/specs/progressive-trace-ux/spec.md
@@ -0,0 +1,8 @@
+## MODIFIED Requirements
+
+### Requirement: Trace stage timeout
+The `useTraceProgress` composable's `DEFAULT_STAGE_TIMEOUT_MS` SHALL be 360000 (360 seconds) to accommodate large-scale trace operations.
+
+#### Scenario: Large trace operation completes
+- **WHEN** a trace stage (seed-resolve, lineage, or events) takes up to 300 seconds
+- **THEN** the frontend does not abort the stage request
--- a/openspec/changes/historical-query-slow-connection/specs/query-tool-equipment/spec.md
+++ b/openspec/changes/historical-query-slow-connection/specs/query-tool-equipment/spec.md
@@ -0,0 +1,8 @@
+## MODIFIED Requirements
+
+### Requirement: Frontend API timeout
+The query-tool equipment query, lot detail, lot jobs table, lot resolve, lot lineage, and reverse lineage composables SHALL use a 360-second API timeout for all Oracle-backed API calls.
+
+#### Scenario: Equipment period query completes
+- **WHEN** a user queries equipment history for a long period
+- **THEN** the frontend does not abort the request for at least 360 seconds
--- a/openspec/changes/historical-query-slow-connection/specs/query-tool-lot-trace/spec.md
+++ b/openspec/changes/historical-query-slow-connection/specs/query-tool-lot-trace/spec.md
@@ -0,0 +1,9 @@
+## MODIFIED Requirements
+
+### Requirement: Slow query timeout configuration
+The query-tool service `read_sql_df_slow` call for full split/merge history SHALL use the config-driven default timeout instead of a hardcoded 120-second timeout.
+
+#### Scenario: Full history query uses config timeout
+- **WHEN** `full_history=True` split/merge query is executed
+- **THEN** it uses `read_sql_df_slow` with the default timeout from `DB_SLOW_CALL_TIMEOUT_MS` (300s)
+- **AND** the hardcoded `timeout_seconds=120` parameter is removed
--- a/openspec/changes/historical-query-slow-connection/specs/reject-history-api/spec.md
+++ b/openspec/changes/historical-query-slow-connection/specs/reject-history-api/spec.md
@@ -0,0 +1,10 @@
+## MODIFIED Requirements
+
+### Requirement: Database query execution path
+The reject-history service (`reject_history_service.py` and `reject_dataset_cache.py`) SHALL use `read_sql_df_slow` (dedicated connection) instead of `read_sql_df` (pooled connection) for all Oracle queries.
+
+#### Scenario: Primary query uses dedicated connection
+- **WHEN** the reject-history primary query is executed
+- **THEN** it uses `read_sql_df_slow` which creates a dedicated Oracle connection outside the pool
+- **AND** the connection has a 300-second call_timeout (configurable)
+- **AND** the connection is subject to the global slow query semaphore
--- a/openspec/changes/historical-query-slow-connection/specs/reject-history-page/spec.md
+++ b/openspec/changes/historical-query-slow-connection/specs/reject-history-page/spec.md
@@ -0,0 +1,8 @@
+## MODIFIED Requirements
+
+### Requirement: Frontend API timeout
+The reject-history page SHALL use a 360-second API timeout (up from 60 seconds) for all Oracle-backed API calls.
+
+#### Scenario: Large date range query completes
+- **WHEN** a user queries reject history for a long date range
+- **THEN** the frontend does not abort the request for at least 360 seconds
--- a/openspec/changes/historical-query-slow-connection/specs/resource-history-page/spec.md
+++ b/openspec/changes/historical-query-slow-connection/specs/resource-history-page/spec.md
@@ -0,0 +1,16 @@
+## MODIFIED Requirements
+
+### Requirement: Database query execution path
+The resource-history service (`resource_history_service.py`) SHALL use `read_sql_df_slow` (dedicated connection) instead of `read_sql_df` (pooled connection) for all Oracle queries.
+
+#### Scenario: Summary parallel queries use dedicated connections
+- **WHEN** the resource-history summary query executes 3 parallel queries via ThreadPoolExecutor
+- **THEN** each query uses `read_sql_df_slow` and acquires a semaphore slot
+- **AND** all 3 queries complete and release their slots
+
+### Requirement: Frontend timeout
+The resource-history page frontend SHALL use a 360-second API timeout for all Oracle-backed API calls.
+
+#### Scenario: Large date range query completes
+- **WHEN** a user queries resource history for a 2-year date range
+- **THEN** the frontend does not abort the request for at least 360 seconds
--- a/openspec/changes/historical-query-slow-connection/specs/slow-query-concurrency-control/spec.md
+++ b/openspec/changes/historical-query-slow-connection/specs/slow-query-concurrency-control/spec.md
@@ -0,0 +1,53 @@
+## ADDED Requirements
+
+### Requirement: Configurable slow query timeout
+The system SHALL read `DB_SLOW_CALL_TIMEOUT_MS` from environment/config to determine the default `call_timeout` for `read_sql_df_slow`. The default value SHALL be 300000 (300 seconds).
+
+#### Scenario: Default timeout when no env var set
+- **WHEN** `DB_SLOW_CALL_TIMEOUT_MS` is not set in environment
+- **THEN** `read_sql_df_slow` uses 300 seconds as call_timeout
+
+#### Scenario: Custom timeout from env var
+- **WHEN** `DB_SLOW_CALL_TIMEOUT_MS` is set to 180000
+- **THEN** `read_sql_df_slow` uses 180 seconds as call_timeout
+
+#### Scenario: Caller overrides timeout
+- **WHEN** caller passes `timeout_seconds=120` to `read_sql_df_slow`
+- **THEN** the function uses 120 seconds regardless of config value
+
+### Requirement: Semaphore-based concurrency control
+The system SHALL use a global `threading.Semaphore` to limit the number of concurrent `read_sql_df_slow` executions. The limit SHALL be configurable via `DB_SLOW_MAX_CONCURRENT` with a default of 3.
+
+#### Scenario: Concurrent queries within limit
+- **WHEN** 2 slow queries are running and a 3rd is submitted (limit=3)
+- **THEN** the 3rd query proceeds immediately
+
+#### Scenario: Concurrent queries exceed limit
+- **WHEN** 3 slow queries are running and a 4th is submitted (limit=3)
+- **THEN** the 4th query waits up to 60 seconds for a slot
+- **AND** if no slot becomes available, raises RuntimeError with message indicating all slots are busy
+
+#### Scenario: Semaphore release on query failure
+- **WHEN** a slow query raises an exception during execution
+- **THEN** the semaphore slot is released in the finally block
+
+### Requirement: Slow query active count diagnostic
+The system SHALL expose the current number of active slow queries via `get_slow_query_active_count()` and include it in `get_pool_status()` as `slow_query_active`.
+
+#### Scenario: Active count in pool status
+- **WHEN** 2 slow queries are running
+- **THEN** `get_pool_status()` returns `slow_query_active: 2`
+
+### Requirement: Gunicorn timeout accommodates slow queries
+The Gunicorn worker timeout SHALL be at least 360 seconds to accommodate the maximum slow query duration (300s) plus overhead.
+
+#### Scenario: Long query does not kill worker
+- **WHEN** a slow query takes 280 seconds to complete
+- **THEN** the Gunicorn worker does not timeout and the response is delivered
+
+### Requirement: Config settings in all environments
+All environment configs (Config, DevelopmentConfig, ProductionConfig, TestingConfig) SHALL define `DB_SLOW_CALL_TIMEOUT_MS` and `DB_SLOW_MAX_CONCURRENT`.
+
+#### Scenario: Testing config uses short timeout
+- **WHEN** running in testing environment
+- **THEN** `DB_SLOW_CALL_TIMEOUT_MS` defaults to 10000 and `DB_SLOW_MAX_CONCURRENT` defaults to 1
--- a/openspec/changes/historical-query-slow-connection/tasks.md
+++ b/openspec/changes/historical-query-slow-connection/tasks.md
@@ -0,0 +1,39 @@
+## 1. Core Infrastructure
+
+- [x] 1.1 Add `DB_SLOW_CALL_TIMEOUT_MS` (default 300000) and `DB_SLOW_MAX_CONCURRENT` (default 3) to all config classes in `settings.py` (Config, DevelopmentConfig=2, ProductionConfig=3, TestingConfig=1/10000)
+- [x] 1.2 Update `get_db_runtime_config()` in `database.py` to include `slow_call_timeout_ms` and `slow_max_concurrent`
+- [x] 1.3 Add module-level `threading.Semaphore`, active count tracking, and `get_slow_query_active_count()` in `database.py`
+- [x] 1.4 Refactor `read_sql_df_slow()`: default `timeout_seconds=None` (reads from config), acquire/release semaphore, log active count
+- [x] 1.5 Update `dispose_engine()` to reset semaphore; add `slow_query_active` to `get_pool_status()`
+- [x] 1.6 Increase Gunicorn timeout to 360s and graceful_timeout to 120s in `gunicorn.conf.py`
+
+## 2. Backend Service Migration
+
+- [x] 2.1 `reject_history_service.py`: change import to `read_sql_df_slow as read_sql_df`
+- [x] 2.2 `reject_dataset_cache.py`: change import to `read_sql_df_slow as read_sql_df`
+- [x] 2.3 `hold_history_service.py`: change import to `read_sql_df_slow as read_sql_df` (keep DatabaseCircuitOpenError/DatabasePoolExhaustedError imports)
+- [x] 2.4 `resource_history_service.py`: change import to `read_sql_df_slow as read_sql_df`
+- [x] 2.5 `job_query_service.py`: change import to `read_sql_df_slow as read_sql_df` (keep `get_db_connection` import)
+- [x] 2.6 `excel_query_service.py`: set `connection.call_timeout = runtime["slow_call_timeout_ms"]` on direct connections in `execute_batch_query` and `execute_advanced_batch_query`
+- [x] 2.7 `query_tool_service.py`: remove hardcoded `timeout_seconds=120` from `read_sql_df_slow` call (line 1131)
+
+## 3. Frontend Timeout Updates
+
+- [x] 3.1 `reject-history/App.vue`: `API_TIMEOUT` 60000 → 360000
+- [x] 3.2 `mid-section-defect/App.vue`: `API_TIMEOUT` 120000 → 360000
+- [x] 3.3 `hold-history/App.vue`: `API_TIMEOUT` 60000 → 360000
+- [x] 3.4 `resource-history/App.vue`: `API_TIMEOUT` 60000 → 360000
+- [x] 3.5 `shared-composables/useTraceProgress.js`: `DEFAULT_STAGE_TIMEOUT_MS` 60000 → 360000
+- [x] 3.6 `job-query/composables/useJobQueryData.js`: all `timeout: 60000` → 360000 (3 sites)
+- [x] 3.7 `excel-query/composables/useExcelQueryData.js`: `timeout: 120000` → 360000 (2 sites, lines 135, 255)
+- [x] 3.8 `query-tool/composables/useLotDetail.js`: `timeout: 120000` → 360000 (3 sites) and `timeout: 60000` → 360000 (1 site)
+- [x] 3.9 `query-tool/composables/useEquipmentQuery.js`: `timeout: 120000` → 360000 and `timeout: 60000` → 360000
+- [x] 3.10 `query-tool/composables/useLotResolve.js`: `timeout: 60000` → 360000
+- [x] 3.11 `query-tool/composables/useLotLineage.js`: `timeout: 60000` → 360000
+- [x] 3.12 `query-tool/composables/useReverseLineage.js`: `timeout: 60000` → 360000
+- [x] 3.13 `query-tool/components/LotJobsTable.vue`: `timeout: 60000` → 360000
+
+## 4. Verification
+
+- [x] 4.1 Run `python -m pytest tests/ -v` — all existing tests pass (28 pre-existing failures, 1076 passed, 0 new failures)
+- [x] 4.2 Run `cd frontend && npm run build` — frontend builds successfully
--- a/openspec/changes/msd-multifactor-backward-tracing/.openspec.yaml
+++ b/openspec/changes/msd-multifactor-backward-tracing/.openspec.yaml
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-02-24
--- a/openspec/changes/msd-multifactor-backward-tracing/design.md
+++ b/openspec/changes/msd-multifactor-backward-tracing/design.md
@@ -0,0 +1,107 @@
+## Context
+
+「製程不良追溯分析」是 TMTT 測試站不良的回溯歸因工具。目前架構為三階段 staged pipeline（seed-resolve → lineage → events），反向追溯只取 `upstream_history` domain 做機台歸因。
+
+已有的基礎設施：
+- **EventFetcher** 已支援 `materials` domain（查 `LOTMATERIALSHISTORY`），可直接複用
+- **LineageEngine** 已能追溯 split chain 到 root ancestor，`child_to_parent` map 可直接求 root
+- 前端使用 **ECharts** (vue-echarts) 渲染柏拉圖，組件化完善
+- 歸因邏輯 `_attribute_defects()` 是通用 pattern：`factor_value → detection_lots mapping → rate calculation`
+
+## Goals / Non-Goals
+
+**Goals:**
+- 在反向追溯中新增原物料、源頭晶片兩個歸因維度，與現有機台歸因並列
+- 提升柏拉圖的分析能力（排序切換、80% 標記線、tooltip 強化）
+- 讓使用者理解歸因計算方式（分析摘要面板）
+- 明細表嫌疑因子命中呈現，與柏拉圖 Top N 連動
+- 嫌疑機台的維修上下文面板
+- 報廢歷史查詢頁面承接產品分布分析（PACKAGE / TYPE / WORKFLOW）
+
+**Non-Goals:**
+- 正向追溯改版（後續獨立處理）
+- 維修事件作為獨立歸因維度（時間交叉模型複雜，本次僅作為嫌疑機台的上下文資訊）
+- Cross-filter 聯動（點圖表 bar 聯動篩選所有圖表 — 本次不做，僅做嫌疑命中）
+- 站點層級歸因柏拉圖（產品每站都經過，無鑑別力）
+
+## Decisions
+
+### D1: 原物料歸因邏輯 — 複用 `_attribute_defects` pattern
+
+**選擇**：新增 `_attribute_materials()` 函數，邏輯與 `_attribute_defects()` 完全對稱，只是 key 從 `(workcenter_group, equipment_name, equipment_id)` 換成 `(material_part_name, material_lot_name)`。
+
+**替代方案**：泛化為通用 `_attribute_by_factor(records, key_fn)` 函數。
+**理由**：各維度的 record 結構不同（upstream_history 有 station/equipment，materials 有 part/lot），強行泛化需要額外 adapter 層。先用對稱複製，後續若新增更多維度再考慮抽象。
+
+### D2: 原物料資料取得 — EventFetcher materials domain
+
+**選擇**：在 staged trace events 階段新增請求 `materials` domain。前端的 `useTraceProgress.js` 在 backward 模式時改為 `domains: ['upstream_history', 'materials']`。
+
+**替代方案**：另開一個獨立 API 查原物料。
+**理由**：EventFetcher 已有完善的 materials domain（batch、cache、rate limit），staged trace pipeline 已能處理多 domain 並行查詢，無需重複建設。
+
+### D3: 源頭晶片歸因 — 從 lineage ancestors 提取 root
+
+**選擇**：在 lineage stage 的 response 中新增 `roots` 欄位（`{seed_cid: root_container_name}`），不需要額外 SQL 查詢。LineageEngine 已有 `child_to_parent` map，遍歷到無 parent 的節點即為 root。後端 `_attribute_wafer_roots()` 以 `root_container_name` 為 key 做歸因。
+
+**替代方案**：用 SQL CONNECT BY 直接查 root。
+**理由**：lineage stage 已經完整追溯了 split chain，root 資訊是副產品，不需要額外 DB roundtrip。
+
+### D4: 柏拉圖排列 — 替換 PACKAGE/TYPE/WORKFLOW
+
+**選擇**：6 張柏拉圖改為：
+
+| 位置 | 原本 | 改為 |
+|------|------|------|
+| 左上 | 依上游機台歸因 | 依上游機台歸因（保留） |
+| 右上 | 依不良原因 | 依原物料歸因（新） |
+| 左中 | 依偵測機台 | 依源頭晶片歸因（新） |
+| 右中 | 依 WORKFLOW | 依不良原因（保留） |
+| 左下 | 依 PACKAGE | 依偵測機台（保留） |
+| 右下 | 依 TYPE | 移除 |
+
+改為 5 張（2-2-1 排列），最後一行只有偵測機台。或視空間調整為 3-2 或 2-2-2。
+
+**理由**：PACKAGE / TYPE / WORKFLOW 是「不良分布在哪些產品上」的分析，屬於報廢歷史查詢的範疇。製程不良追溯的核心問題是「不良來自哪個上游因子」。
+
+### D5: 明細表結構化上游資料
+
+**選擇**：後端 `_build_detail_table` 的 `UPSTREAM_MACHINES` 欄位改為回傳 list of `{station, machine}` 對象。同時新增 `UPSTREAM_MATERIALS` (list of `{part, lot}`) 和 `UPSTREAM_WAFER_ROOT` (string) 欄位。CSV export 時 flatten 回逗號字串。
+
+前端明細表不顯示全部上游機台，改為只顯示「嫌疑因子命中」：根據當前柏拉圖（含 inline filter）的 Top N 嫌疑名單，與該 LOT 的上游因子做交叉比對。
+
+### D6: 嫌疑機台上下文面板 — Popover 或 Side Drawer
+
+**選擇**：使用 Popover（點擊柏拉圖 bar 時彈出），內容包含：
+- 歸因數據：不良率、LOT 數、投入/報廢
+- 設備資訊：站點、機型 (RESOURCEFAMILYNAME)
+- 近期維修：呼叫 `GET /api/query-tool/lot-associations?type=jobs&container_id=<equipment_id>` 取近期 JOB 紀錄（需要以 equipment_id 查詢的 endpoint，可能需要新增或複用 equipment-period jobs）
+
+**替代方案**：Side drawer 或 modal。
+**理由**：Popover 輕量、不離開當前分析上下文。維修資料只需 3-5 筆近期紀錄，不需要完整的 JOB 列表。
+
+### D7: 報廢歷史查詢頁面新增 Pareto 維度
+
+**選擇**：在現有 ParetoSection.vue 新增維度下拉選擇器。後端 `reject_history_service.py` 的 reason Pareto 邏輯改為可指定 `dimension` 參數（`reason` / `package` / `type` / `workflow` / `workcenter` / `equipment`）。
+
+## Risks / Trade-offs
+
+### R1: 原物料資料量
+原物料紀錄可能比 upstream_history 多很多（一個 LOT 可能消耗多種材料）。2000 個 LOT + 血緣的原物料查詢可能返回大量資料。
+→ **Mitigation**: EventFetcher 已有 batch + cache 機制。若資料量過大，可限制原物料歸因只取前幾種 material_part_name（如前 20 種），其餘歸為「其他」。
+
+### R2: LineageEngine root 不一定是晶圓
+Split chain 的 root 可能不總是代表「晶圓批次」，取決於產品結構。某些產品線的 root 可能是其他中間製程的 LOT。
+→ **Mitigation**: 以 `root_container_name` 顯示，不硬標籤為「晶圓」。UI label 用「源頭批次」而非「晶圓」。
+
+### R3: Staged trace events 增加 domain 的延遲
+新增 materials domain 會增加 events stage 的執行時間。
+→ **Mitigation**: EventFetcher 已支援 concurrent domain 查詢（ThreadPoolExecutor），materials 和 upstream_history 可並行。Cache TTL 300s 也能有效減輕重複查詢。
+
+### R4: 嫌疑機台維修資料可能需要新的 API
+目前 query-tool 的 jobs API 是以 container_id 或 equipment + time_range 查詢，沒有「某台設備的近期 N 筆維修」的 endpoint。
+→ **Mitigation**: 新增一個輕量的 equipment-recent-jobs endpoint，或在前端直接用 equipment-period jobs API 查近 30 天即可。
+
+### R5: 報廢歷史查詢的 Pareto 維度切換需要後端支援
+目前 Pareto 只支援按 reason 聚合，新增其他維度需要後端 SQL 重寫。
+→ **Mitigation**: 報廢歷史查詢使用 two-phase caching pattern（完整 DataFrame 已快取），在 view refresh 階段做不同維度的 groupby 即可，不需重新查 DB。
--- a/openspec/changes/msd-multifactor-backward-tracing/proposal.md
+++ b/openspec/changes/msd-multifactor-backward-tracing/proposal.md
@@ -0,0 +1,76 @@
+## Why
+
+「製程不良追溯分析」頁面（mid-section-defect）目前的反向追溯只歸因到**上游機台**這單一維度。品質工程師在實務上需要同時比對多個因子（機台、原物料批號、源頭晶片）才能定位根因。現有的 EventFetcher 和 LineageEngine 已具備查詢原物料與血緣的能力，但分析頁面尚未整合這些資料來源。此外，頁面缺乏對歸因邏輯的透明度說明，使用者不了解數字是如何計算出來的。
+
+## What Changes
+
+### 新增多因子歸因維度
+- 新增「依原物料歸因」柏拉圖：以 `material_part_name + material_lot` 為 key，計算歸因不良率（與機台歸因相同邏輯）
+- 新增「依源頭晶片歸因」柏拉圖：以 LineageEngine split chain 的 root ancestor (`CONTAINERNAME`) 為 key，計算歸因不良率
+- 移除原有的「依封裝 (PACKAGE)」與「依 TYPE」柏拉圖，改為上述兩張
+- 「依製程 (WORKFLOW)」柏拉圖移除（產品分布相關分析由「報廢歷史查詢」頁面負責）
+
+### 柏拉圖改善
+- 新增排序 toggle：可切換「依不良數」/「依不良率」排序
+- 新增 80% 累計標記線（ECharts markLine）
+- Tooltip 增加顯示「關聯 LOT 數」
+
+### 分析摘要面板
+- 在 KPI 卡片上方新增可收合的「分析摘要」區塊
+- 顯示：查詢條件、資料範圍統計（LOT 總數/投入/報廢 LOT 數/報廢總數/血緣追溯涵蓋數）、歸因邏輯文字說明
+
+### 明細表嫌疑因子命中
+- 後端 `_build_detail_table` 改回傳結構化的上游機台資料（list of `{station, machine}` objects），取代扁平化逗號字串
+- 前端根據當前柏拉圖 Top N 嫌疑因子，在明細表顯示命中狀況（如 `WIRE-03, DIE-01 (2/3)`）
+- 嫌疑名單跟隨柏拉圖 inline filter 連動
+
+### 嫌疑機台上下文面板
+- 點擊柏拉圖的機台 bar 時，顯示該機台的上下文面板
+- 面板內容：歸因數據摘要、所屬站點/機型、近期維修紀錄（透過 query-tool 的 `get_lot_jobs` API 取得）
+
+### 後端：多因子歸因引擎
+- 在 `mid_section_defect_service.py` 新增 `_attribute_materials()` 函數（歸因邏輯與 `_attribute_defects` 相同 pattern）
+- 在 `mid_section_defect_service.py` 新增 `_attribute_wafer_roots()` 函數（以 root ancestor 為 key）
+- Staged trace API events stage 新增請求 `materials` domain 的支援（已在 EventFetcher 中支援，只需在 mid_section_defect profile 的 domain 列表中加入）
+- `_build_all_charts` 改為使用新的 DIMENSION_MAP（移除 by_package / by_pj_type / by_workflow，新增 by_material / by_wafer_root）
+
+### 報廢歷史查詢頁面增強
+- 將「依 PACKAGE / TYPE / WORKFLOW」的產品分布分析遷移到報廢歷史查詢頁面
+- 在報廢歷史查詢頁面新增 Pareto 維度選擇器，支援多維度切換（原因、PACKAGE、TYPE、WORKFLOW、站點、機台）
+
+## Capabilities
+
+### New Capabilities
+- `msd-multifactor-attribution`: 製程不良追溯分析的多因子歸因引擎（原物料、源頭晶片）及對應的柏拉圖呈現
+- `msd-analysis-transparency`: 分析摘要面板，顯示查詢條件、資料範圍、歸因邏輯說明
+- `msd-suspect-context`: 嫌疑機台上下文面板及明細表嫌疑因子命中呈現
+
+### Modified Capabilities
+- `reject-history-page`: 新增產品分布 Pareto 維度（PACKAGE / TYPE / WORKFLOW），接收從製程不良追溯頁面遷出的分析責任
+- `trace-staged-api`: mid_section_defect profile 的 events stage 新增 `materials` domain 請求，aggregation 邏輯新增原物料與晶片歸因
+
+## Impact
+
+### Backend
+- `src/mes_dashboard/services/mid_section_defect_service.py` — 核心改動：新增歸因函數、修改 chart builder、修改 detail table 結構
+- `src/mes_dashboard/routes/trace_routes.py` — events stage 的 mid_section_defect profile domain 列表擴充
+- `src/mes_dashboard/routes/mid_section_defect_routes.py` — export 可能需要對應新欄位
+- `src/mes_dashboard/services/reject_history_service.py` — 新增 Pareto 維度支援
+- `src/mes_dashboard/routes/reject_history_routes.py` — 新增維度參數
+
+### Frontend
+- `frontend/src/mid-section-defect/App.vue` — 主要改動：新增分析摘要、重排柏拉圖、嫌疑命中邏輯
+- `frontend/src/mid-section-defect/components/ParetoChart.vue` — 新增排序 toggle、80% markLine、tooltip lot_count
+- `frontend/src/mid-section-defect/components/DetailTable.vue` — 嫌疑命中欄位改版
+- `frontend/src/mid-section-defect/components/KpiCards.vue` — 可能微調
+- 新增 `frontend/src/mid-section-defect/components/AnalysisSummary.vue`
+- 新增 `frontend/src/mid-section-defect/components/SuspectContextPanel.vue`
+- `frontend/src/reject-history/components/ParetoSection.vue` — 新增維度選擇器
+- `frontend/src/reject-history/App.vue` — 支援多維度 Pareto
+
+### SQL
+- 可能新增 `src/mes_dashboard/sql/mid_section_defect/upstream_materials.sql`（或直接複用 EventFetcher materials domain 的 `query_tool/lot_materials.sql`）
+
+### Tests
+- `tests/test_mid_section_defect.py` — 新增原物料/晶片歸因的單元測試
+- `tests/test_reject_history_routes.py` — 新增維度 Pareto 測試
--- a/openspec/changes/msd-multifactor-backward-tracing/specs/msd-analysis-transparency/spec.md
+++ b/openspec/changes/msd-multifactor-backward-tracing/specs/msd-analysis-transparency/spec.md
@@ -0,0 +1,48 @@
+## ADDED Requirements
+
+### Requirement: Analysis page SHALL display a collapsible analysis summary panel
+The page SHALL show a summary panel above KPI cards explaining the query context, data scope, and attribution methodology.
+
+#### Scenario: Summary panel rendering
+- **WHEN** backward analysis data is loaded
+- **THEN** a collapsible panel SHALL appear above the KPI cards
+- **THEN** the panel SHALL be expanded by default on first render
+- **THEN** the panel SHALL include a toggle control to collapse/expand
+
+#### Scenario: Query context section
+- **WHEN** the summary panel is rendered
+- **THEN** it SHALL display the committed query parameters: detection station name, date range (or container mode info), and selected loss reasons (or「全部」if none selected)
+
+#### Scenario: Data scope section
+- **WHEN** the summary panel is rendered
+- **THEN** it SHALL display:
+  - 偵測站 LOT 總數 (total detection lots count)
+  - 總投入 (total input qty in pcs)
+  - 報廢 LOT 數 (lots with defects matching selected loss reasons)
+  - 報廢總數 (total reject qty in pcs)
+  - 血緣追溯涵蓋上游 LOT 數 (total unique ancestor count)
+
+#### Scenario: Ancestor count from lineage response
+- **WHEN** lineage stage returns response
+- **THEN** the response SHALL include `total_ancestor_count` (number of unique ancestor CIDs across all seeds, excluding seeds themselves)
+- **THEN** the summary panel SHALL use this value for「血緣追溯涵蓋上游 LOT」
+
+#### Scenario: Attribution methodology section
+- **WHEN** the summary panel is rendered
+- **THEN** it SHALL display a static text block explaining the attribution logic:
+  - All LOTs passing through the detection station (including those with no defects) are included in analysis
+  - Each LOT's upstream lineage (split/merge chain) is traced to identify associated upstream factors
+  - Attribution rate = sum of associated LOTs' reject qty / sum of associated LOTs' input qty × 100%
+  - The same defect can be attributed to multiple upstream factors (non-exclusive)
+  - Pareto bar height = attributed defect count (with overlap), orange line = attributed defect rate
+
+#### Scenario: Summary panel in container mode
+- **WHEN** query mode is container mode
+- **THEN** the query context section SHALL show the input type, resolved count, and not-found count instead of date range
+- **THEN** the data scope section SHALL still show LOT count and input/reject totals
+
+#### Scenario: Summary panel collapsed state persistence
+- **WHEN** user collapses the summary panel
+- **THEN** the collapsed state SHALL persist within the current session (sessionStorage)
+- **WHEN** user triggers a new query
+- **THEN** the panel SHALL remain in its current collapsed/expanded state
--- a/openspec/changes/msd-multifactor-backward-tracing/specs/msd-multifactor-attribution/spec.md
+++ b/openspec/changes/msd-multifactor-backward-tracing/specs/msd-multifactor-attribution/spec.md
@@ -0,0 +1,93 @@
+## ADDED Requirements
+
+### Requirement: Backward tracing SHALL attribute defects to upstream materials
+The system SHALL compute material-level attribution using the same pattern as machine attribution: for each material `(part_name, lot_name)` consumed by detection lots or their ancestors, calculate the defect rate among associated detection lots.
+
+#### Scenario: Materials attribution data flow
+- **WHEN** backward tracing events stage completes with `upstream_history` and `materials` domains
+- **THEN** the aggregation engine SHALL build a `material_key → detection_lots` mapping where `material_key = (MATERIALPARTNAME, MATERIALLOTNAME)`
+- **THEN** for each material key, `attributed_defect_rate = Σ(REJECTQTY of associated detection lots) / Σ(TRACKINQTY of associated detection lots) × 100`
+
+#### Scenario: Materials domain requested in backward trace
+- **WHEN** the frontend executes backward tracing with `mid_section_defect` profile
+- **THEN** the events stage SHALL request domains `['upstream_history', 'materials']`
+- **THEN** the `materials` domain SHALL use the existing EventFetcher materials domain (querying `LOTMATERIALSHISTORY`)
+
+#### Scenario: Materials Pareto chart rendering
+- **WHEN** materials attribution data is available
+- **THEN** the frontend SHALL render a Pareto chart titled「依原物料歸因」
+- **THEN** each bar SHALL represent a `material_part_name (material_lot_name)` combination
+- **THEN** the chart SHALL show Top 10 items sorted by defect_qty, with remaining items grouped as「其他」
+- **THEN** tooltip SHALL display: material name, material lot, defect count, input count, defect rate, cumulative %, and associated LOT count
+
+#### Scenario: Material with no lot name
+- **WHEN** a material record has `MATERIALLOTNAME` as NULL or empty
+- **THEN** the material key SHALL use `material_part_name` only (without lot suffix)
+- **THEN** display label SHALL show the part name without parenthetical lot
+
+### Requirement: Backward tracing SHALL attribute defects to wafer root ancestors
+The system SHALL compute root-ancestor-level attribution by identifying the split chain root for each detection lot and calculating defect rates per root.
+
+#### Scenario: Root ancestor identification
+- **WHEN** lineage stage returns `ancestors` data (child_to_parent map)
+- **THEN** the backend SHALL identify root ancestors by traversing the parent chain for each seed until reaching a container with no further parent
+- **THEN** roots SHALL be returned as `{seed_container_id: root_container_name}` in the lineage response
+
+#### Scenario: Root attribution calculation
+- **WHEN** root mapping is available
+- **THEN** the aggregation engine SHALL build a `root_container_name → detection_lots` mapping
+- **THEN** for each root, `attributed_defect_rate = Σ(REJECTQTY) / Σ(TRACKINQTY) × 100`
+
+#### Scenario: Wafer root Pareto chart rendering
+- **WHEN** root attribution data is available
+- **THEN** the frontend SHALL render a Pareto chart titled「依源頭批次歸因」
+- **THEN** each bar SHALL represent a root ancestor `CONTAINERNAME`
+- **THEN** the chart SHALL show Top 10 items with cumulative percentage line
+
+#### Scenario: Detection lot with no ancestors
+- **WHEN** a detection lot has no split chain ancestors (it is its own root)
+- **THEN** the root mapping SHALL map the lot to its own `CONTAINERNAME`
+
+### Requirement: Backward Pareto layout SHALL show 5 charts in machine/material/wafer/reason/detection arrangement
+The backward tracing chart section SHALL display exactly 5 Pareto charts replacing the previous 6-chart layout.
+
+#### Scenario: Chart grid layout
+- **WHEN** backward analysis data is rendered
+- **THEN** charts SHALL be arranged as:
+  - Row 1: 依上游機台歸因 | 依原物料歸因
+  - Row 2: 依源頭批次歸因 | 依不良原因
+  - Row 3: 依偵測機台 (full width or single)
+- **THEN** the previous「依 WORKFLOW」「依 PACKAGE」「依 TYPE」charts SHALL NOT be rendered
+
+### Requirement: Pareto charts SHALL support sort toggle between defect count and defect rate
+Each Pareto chart SHALL allow the user to switch between sorting by defect quantity and defect rate.
+
+#### Scenario: Default sort order
+- **WHEN** a Pareto chart is first rendered
+- **THEN** bars SHALL be sorted by `defect_qty` descending (current behavior)
+
+#### Scenario: Sort by rate toggle
+- **WHEN** user clicks the sort toggle to「依不良率」
+- **THEN** bars SHALL re-sort by `defect_rate` descending
+- **THEN** cumulative percentage line SHALL recalculate based on the new sort order
+- **THEN** the toggle SHALL visually indicate the active sort mode
+
+#### Scenario: Sort toggle persistence within session
+- **WHEN** user changes sort mode on one chart
+- **THEN** the change SHALL only affect that specific chart (not all charts)
+
+### Requirement: Pareto charts SHALL display an 80% cumulative reference line
+Each Pareto chart SHALL include a horizontal dashed line at the 80% cumulative mark.
+
+#### Scenario: 80% markLine rendering
+- **WHEN** Pareto chart data is rendered with cumulative percentages
+- **THEN** the chart SHALL display a horizontal dashed line at y=80 on the percentage axis
+- **THEN** the line SHALL use a muted color (e.g., `#94a3b8`) with dotted style
+- **THEN** the line label SHALL display「80%」
+
+### Requirement: Pareto chart tooltip SHALL include LOT count
+Each Pareto chart tooltip SHALL show the number of associated detection LOTs.
+
+#### Scenario: Tooltip with LOT count
+- **WHEN** user hovers over a Pareto bar
+- **THEN** the tooltip SHALL display: factor name, 關聯 LOT count (with percentage of total), defect count, input count, defect rate, cumulative percentage
--- a/openspec/changes/msd-multifactor-backward-tracing/specs/msd-suspect-context/spec.md
+++ b/openspec/changes/msd-multifactor-backward-tracing/specs/msd-suspect-context/spec.md
@@ -0,0 +1,77 @@
+## ADDED Requirements
+
+### Requirement: Detail table SHALL display suspect factor hit counts instead of raw upstream machine list
+The backward detail table SHALL replace the flat `UPSTREAM_MACHINES` string column with a structured suspect factor hit display that links to the current Pareto Top N.
+
+#### Scenario: Suspect hit column rendering
+- **WHEN** backward detail table is rendered
+- **THEN** the「上游機台」column SHALL be replaced by a「嫌疑命中」column
+- **THEN** each cell SHALL show the names of upstream machines that appear in the current Pareto Top N suspect list, with a hit ratio (e.g., `WIRE-03, DIE-01 (2/5)`)
+
+#### Scenario: Suspect list derived from Pareto Top N
+- **WHEN** the machine Pareto chart displays Top N machines (after any inline station/spec filters)
+- **THEN** the suspect list SHALL be the set of machine names from those Top N entries
+- **THEN** changing the Pareto inline filters SHALL update the suspect list and re-render the hit column
+
+#### Scenario: Full match indicator
+- **WHEN** a LOT's upstream machines include all machines in the suspect list
+- **THEN** the cell SHALL display a visual indicator (e.g., star or highlight) marking full match
+
+#### Scenario: No hits
+- **WHEN** a LOT's upstream machines include none of the suspect machines
+- **THEN** the cell SHALL display「-」
+
+#### Scenario: Upstream machine count column
+- **WHEN** backward detail table is rendered
+- **THEN** the「上游LOT數」column SHALL remain as-is (showing ancestor count)
+- **THEN** a new「上游台數」column SHALL show the total number of unique upstream machines for that LOT
+
+### Requirement: Backend detail table SHALL return structured upstream data
+The `_build_detail_table` function SHALL return upstream machines as a structured list instead of a flat comma-separated string.
+
+#### Scenario: Structured upstream machines response
+- **WHEN** backward detail API returns LOT records
+- **THEN** each record's `UPSTREAM_MACHINES` field SHALL be a list of `{"station": "<workcenter_group>", "machine": "<equipment_name>"}` objects
+- **THEN** the flat comma-separated string SHALL no longer be returned in this field
+
+#### Scenario: CSV export backward compatibility
+- **WHEN** CSV export is triggered for backward detail
+- **THEN** the `UPSTREAM_MACHINES` column in CSV SHALL flatten the structured list back to comma-separated `station/machine` format
+- **THEN** CSV format SHALL remain unchanged from current behavior
+
+#### Scenario: Structured upstream materials response
+- **WHEN** materials attribution is available
+- **THEN** each detail record SHALL include an `UPSTREAM_MATERIALS` field: list of `{"part": "<material_part_name>", "lot": "<material_lot_name>"}` objects
+
+#### Scenario: Structured wafer root response
+- **WHEN** root ancestor attribution is available
+- **THEN** each detail record SHALL include a `WAFER_ROOT` field: string with root ancestor `CONTAINERNAME`
+
+### Requirement: Suspect machine context panel SHALL show machine details and recent maintenance
+Clicking a machine bar in the Pareto chart SHALL open a context popover showing machine attribution details and recent maintenance history.
+
+#### Scenario: Context panel trigger
+- **WHEN** user clicks a bar in the「依上游機台歸因」Pareto chart
+- **THEN** a popover panel SHALL appear near the clicked bar
+- **WHEN** user clicks outside the popover or clicks the same bar again
+- **THEN** the popover SHALL close
+
+#### Scenario: Context panel content - attribution summary
+- **WHEN** the context panel is displayed
+- **THEN** it SHALL show: equipment name, workcenter group, resource family (RESOURCEFAMILYNAME), attributed defect rate, attributed defect count, attributed input count, associated LOT count
+
+#### Scenario: Context panel content - recent maintenance
+- **WHEN** the context panel is displayed
+- **THEN** it SHALL fetch recent JOB records for the machine's equipment_id (last 30 days)
+- **THEN** it SHALL display up to 5 most recent JOB records showing: JOBID, JOBSTATUS, JOBMODELNAME, CREATEDATE, COMPLETEDATE
+- **WHEN** the machine has no recent JOB records
+- **THEN** the maintenance section SHALL display「近 30 天無維修紀錄」
+
+#### Scenario: Context panel loading state
+- **WHEN** maintenance data is being fetched
+- **THEN** the maintenance section SHALL show a loading indicator
+- **THEN** the attribution summary section SHALL render immediately (data already available from attribution)
+
+#### Scenario: Context panel for non-machine charts
+- **WHEN** user clicks bars in other Pareto charts (materials, wafer root, loss reason, detection machine)
+- **THEN** no context panel SHALL appear (machine context only)
--- a/openspec/specs/hold-dataset-cache/spec.md
+++ b/openspec/specs/hold-dataset-cache/spec.md
@@ -0,0 +1,64 @@
+## ADDED Requirements
+
+### Requirement: Hold dataset cache SHALL execute a single Oracle query and cache the result
+The hold_dataset_cache module SHALL query Oracle once for the full hold/release fact set and cache it for subsequent derivations.
+
+#### Scenario: Primary query execution and caching
+- **WHEN** `execute_primary_query()` is called with date range and hold_type parameters
+- **THEN** a deterministic `query_id` SHALL be computed from the primary params (start_date, end_date) using SHA256
+- **THEN** if a cached DataFrame exists for this query_id (L1 or L2), it SHALL be used without querying Oracle
+- **THEN** if no cache exists, a single Oracle query SHALL fetch all hold/release records from `DW_MES_HOLDRELEASEHISTORY` for the date range (all hold_types)
+- **THEN** the result DataFrame SHALL be stored in both L1 (ProcessLevelCache) and L2 (Redis as parquet/base64)
+- **THEN** the response SHALL include `query_id`, trend, reason_pareto, duration, and list page 1
+
+#### Scenario: Cache TTL and eviction
+- **WHEN** a DataFrame is cached
+- **THEN** the cache TTL SHALL be 900 seconds (15 minutes)
+- **THEN** L1 cache max_size SHALL be 8 entries with LRU eviction
+- **THEN** the Redis namespace SHALL be `hold_dataset`
+
+### Requirement: Hold dataset cache SHALL derive trend data from cached DataFrame
+The module SHALL compute daily trend aggregations from the cached fact set.
+
+#### Scenario: Trend derivation from cache
+- **WHEN** `apply_view()` is called with a valid query_id
+- **THEN** trend data SHALL be derived by grouping the cached DataFrame by date
+- **THEN** the 07:30 shift boundary rule SHALL be applied
+- **THEN** all three hold_type variants (quality, non_quality, all) SHALL be computed from the same DataFrame
+- **THEN** hold_type filtering SHALL be applied in-memory without re-querying Oracle
+
+### Requirement: Hold dataset cache SHALL derive reason Pareto from cached DataFrame
+The module SHALL compute reason distribution from the cached fact set.
+
+#### Scenario: Reason Pareto derivation
+- **WHEN** `apply_view()` is called with hold_type filter
+- **THEN** reason Pareto SHALL be derived by grouping the filtered DataFrame by HOLDREASONNAME
+- **THEN** items SHALL include count, qty, pct, and cumPct
+- **THEN** items SHALL be sorted by count descending
+
+### Requirement: Hold dataset cache SHALL derive duration distribution from cached DataFrame
+The module SHALL compute hold duration buckets from the cached fact set.
+
+#### Scenario: Duration derivation
+- **WHEN** `apply_view()` is called with hold_type filter
+- **THEN** duration distribution SHALL be derived from records where RELEASETXNDATE IS NOT NULL
+- **THEN** 4 buckets SHALL be computed: <4h, 4-24h, 1-3d, >3d
+- **THEN** each bucket SHALL include count and pct
+
+### Requirement: Hold dataset cache SHALL derive paginated list from cached DataFrame
+The module SHALL provide paginated detail records from the cached fact set.
+
+#### Scenario: List pagination from cache
+- **WHEN** `apply_view()` is called with page and per_page parameters
+- **THEN** the cached DataFrame SHALL be filtered by hold_type and optional reason filter
+- **THEN** records SHALL be sorted by HOLDTXNDATE descending
+- **THEN** pagination SHALL be applied in-memory (offset + limit on the sorted DataFrame)
+- **THEN** response SHALL include items and pagination metadata (page, perPage, total, totalPages)
+
+### Requirement: Hold dataset cache SHALL handle cache expiry gracefully
+The module SHALL return appropriate signals when cache has expired.
+
+#### Scenario: Cache expired during view request
+- **WHEN** `apply_view()` is called with a query_id whose cache has expired
+- **THEN** the response SHALL return `{ success: false, error: "cache_expired" }`
+- **THEN** the HTTP status SHALL be 410 (Gone)
--- a/openspec/specs/hold-history-api/spec.md
+++ b/openspec/specs/hold-history-api/spec.md
@@ -1,172 +1,62 @@
-## ADDED Requirements
+## MODIFIED Requirements

 ### Requirement: Hold History API SHALL provide daily trend data with Redis caching
-The API SHALL return daily aggregated hold/release metrics for the selected date range.
+The Hold History API SHALL return trend, reason-pareto, duration, and list data from a single cached dataset via a two-phase query pattern (POST /query + GET /view). The old independent GET endpoints for trend, reason-pareto, duration, and list SHALL be replaced.

-#### Scenario: Trend endpoint returns all three hold types
- **WHEN** `GET /api/hold-history/trend?start_date=2025-01-01&end_date=2025-01-31` is called
- **THEN** the response SHALL return `{ success: true, data: { days: [...] } }`
- **THEN** each day item SHALL contain `{ date, quality: { holdQty, newHoldQty, releaseQty, futureHoldQty }, non_quality: { ... }, all: { ... } }`
- **THEN** all three hold_type variants SHALL be included in a single response
+#### Scenario: Primary query endpoint
+- **WHEN** `POST /api/hold-history/query` is called with `{ start_date, end_date, hold_type }`
+- **THEN** the service SHALL execute a single Oracle query (or read from cache) via `hold_dataset_cache.execute_primary_query()`
+- **THEN** the response SHALL return `{ success: true, data: { query_id, trend, reason_pareto, duration, list, summary } }`
+- **THEN** list SHALL contain page 1 with default per_page of 50
+
+#### Scenario: Supplementary view endpoint
+- **WHEN** `GET /api/hold-history/view?query_id=...&hold_type=...&reason=...&page=...&per_page=...` is called
+- **THEN** the service SHALL read the cached DataFrame and derive filtered views via `hold_dataset_cache.apply_view()`
+- **THEN** no Oracle query SHALL be executed
+- **THEN** the response SHALL return `{ success: true, data: { trend, reason_pareto, duration, list } }`
+
+#### Scenario: Cache expired on view request
+- **WHEN** GET /view is called with an expired query_id
+- **THEN** the response SHALL return `{ success: false, error: "cache_expired" }` with HTTP 410

 #### Scenario: Trend uses shift boundary at 07:30
 - **WHEN** daily aggregation is calculated
 - **THEN** transactions with time >= 07:30 SHALL be attributed to the next calendar day
 - **THEN** transactions with time < 07:30 SHALL be attributed to the current calendar day

-#### Scenario: Trend deduplicates same-day multiple holds
- **WHEN** a lot is held multiple times on the same day
- **THEN** only one hold event SHALL be counted for that day (using ROW_NUMBER per CONTAINERID per day)
-
-#### Scenario: Trend deduplicates future holds
- **WHEN** the same lot has multiple future holds for the same reason
- **THEN** only the first occurrence SHALL be counted (using ROW_NUMBER per CONTAINERID per HOLDREASONID)
-
 #### Scenario: Trend hold type classification
 - **WHEN** trend data is aggregated by hold type
 - **THEN** quality classification SHALL use the same NON_QUALITY_HOLD_REASONS set as existing hold endpoints
- **THEN** holds with HOLDREASONNAME NOT in NON_QUALITY_HOLD_REASONS SHALL be classified as quality
- **THEN** the "all" variant SHALL include both quality and non-quality holds
-
-#### Scenario: Trend Redis cache for recent two months
- **WHEN** the requested date range falls within the current month or previous month
- **THEN** the service SHALL check Redis for cached data at key `hold_history:daily:{YYYY-MM}`
- **THEN** if cache exists, data SHALL be returned from Redis
- **THEN** if cache is missing, data SHALL be queried from Oracle and stored in Redis with 12-hour TTL
-
-#### Scenario: Trend direct Oracle query for older data
- **WHEN** the requested date range includes months older than the previous month
- **THEN** the service SHALL query Oracle directly without caching
-
-#### Scenario: Trend cross-month query assembly
- **WHEN** the requested date range spans multiple months (e.g., 2025-01-15 to 2025-02-15)
- **THEN** the service SHALL fetch each month's data independently (from cache or Oracle)
- **THEN** the service SHALL trim the combined result to the exact requested date range
- **THEN** the response SHALL contain only days within start_date and end_date inclusive
-
-#### Scenario: Trend error
- **WHEN** the database query fails
- **THEN** the response SHALL return `{ success: false, error: '查詢失敗' }` with HTTP 500

 ### Requirement: Hold History API SHALL provide reason Pareto data
-The API SHALL return hold reason distribution for Pareto analysis.
+The reason Pareto data SHALL be derived from the cached dataset, not from a separate Oracle query.

-#### Scenario: Reason Pareto endpoint
- **WHEN** `GET /api/hold-history/reason-pareto?start_date=2025-01-01&end_date=2025-01-31&hold_type=quality` is called
- **THEN** the response SHALL return `{ success: true, data: { items: [...] } }`
+#### Scenario: Reason Pareto from cache
+- **WHEN** reason Pareto is requested via GET /view with hold_type filter
+- **THEN** the cached DataFrame SHALL be filtered by hold_type and grouped by HOLDREASONNAME
 - **THEN** each item SHALL contain `{ reason, count, qty, pct, cumPct }`
 - **THEN** items SHALL be sorted by count descending
- **THEN** pct SHALL be percentage of total hold events
- **THEN** cumPct SHALL be running cumulative percentage
-
-#### Scenario: Reason Pareto uses shift boundary
- **WHEN** hold events are counted for Pareto
- **THEN** the 07:30 shift boundary rule SHALL be applied to HOLDTXNDATE
-
-#### Scenario: Reason Pareto hold type filter
- **WHEN** hold_type is "quality"
- **THEN** only quality hold reasons SHALL be included
- **WHEN** hold_type is "non-quality"
- **THEN** only non-quality hold reasons SHALL be included
- **WHEN** hold_type is "all"
- **THEN** all hold reasons SHALL be included

 ### Requirement: Hold History API SHALL provide hold duration distribution
-The API SHALL return hold duration distribution buckets.
+The duration distribution SHALL be derived from the cached dataset.

-#### Scenario: Duration endpoint
- **WHEN** `GET /api/hold-history/duration?start_date=2025-01-01&end_date=2025-01-31&hold_type=quality` is called
- **THEN** the response SHALL return `{ success: true, data: { items: [...] } }`
- **THEN** items SHALL contain 4 buckets: `{ range: "<4h", count, pct }`, `{ range: "4-24h", count, pct }`, `{ range: "1-3d", count, pct }`, `{ range: ">3d", count, pct }`
-
-#### Scenario: Duration only includes released holds
- **WHEN** duration is calculated
- **THEN** only hold records with RELEASETXNDATE IS NOT NULL SHALL be included
- **THEN** duration SHALL be calculated as RELEASETXNDATE - HOLDTXNDATE
-
-#### Scenario: Duration date range filter
- **WHEN** start_date and end_date are provided
- **THEN** only holds with HOLDTXNDATE within the date range (applying 07:30 shift boundary) SHALL be included
-
-### Requirement: Hold History API SHALL provide department statistics
-The API SHALL return hold/release statistics aggregated by department with optional person detail.
-
-#### Scenario: Department endpoint
- **WHEN** `GET /api/hold-history/department?start_date=2025-01-01&end_date=2025-01-31&hold_type=quality` is called
- **THEN** the response SHALL return `{ success: true, data: { items: [...] } }`
- **THEN** each item SHALL contain `{ dept, holdCount, releaseCount, avgHoldHours, persons: [{ name, holdCount, releaseCount, avgHoldHours }] }`
- **THEN** items SHALL be sorted by holdCount descending
-
-#### Scenario: Department with reason filter
- **WHEN** `GET /api/hold-history/department?start_date=2025-01-01&end_date=2025-01-31&hold_type=quality&reason=品質確認` is called
- **THEN** only hold records matching the specified reason SHALL be included in department and person statistics
-
-#### Scenario: Department hold count vs release count
- **WHEN** department statistics are calculated
- **THEN** holdCount SHALL count records where HOLDEMPDEPTNAME equals the department AND HOLDTXNDATE is within the date range
- **THEN** releaseCount SHALL count records where RELEASEEMPDEPTNAME equals the department AND RELEASETXNDATE is within the date range
- **THEN** avgHoldHours SHALL be the average of (RELEASETXNDATE - HOLDTXNDATE) in hours for released holds initiated by that department
+#### Scenario: Duration from cache
+- **WHEN** duration is requested via GET /view
+- **THEN** the cached DataFrame SHALL be filtered to released holds only
+- **THEN** 4 buckets SHALL be computed: <4h, 4-24h, 1-3d, >3d

 ### Requirement: Hold History API SHALL provide paginated detail list
-The API SHALL return a paginated list of individual hold/release records.
+The detail list SHALL be paginated from the cached dataset.

-#### Scenario: List endpoint
- **WHEN** `GET /api/hold-history/list?start_date=2025-01-01&end_date=2025-01-31&hold_type=quality&page=1&per_page=50` is called
- **THEN** the response SHALL return `{ success: true, data: { items: [...], pagination: { page, perPage, total, totalPages } } }`
- **THEN** each item SHALL contain: lotId, workorder, workcenter, holdReason, holdDate, holdEmp, holdComment, releaseDate, releaseEmp, releaseComment, holdHours, ncr
- **THEN** items SHALL be sorted by HOLDTXNDATE descending
+#### Scenario: List pagination from cache
+- **WHEN** list is requested via GET /view with page and per_page params
+- **THEN** the cached DataFrame SHALL be filtered and paginated in-memory
+- **THEN** response SHALL include items and pagination metadata

-#### Scenario: List with reason filter
- **WHEN** `GET /api/hold-history/list?start_date=2025-01-01&end_date=2025-01-31&hold_type=quality&reason=品質確認` is called
- **THEN** only records matching the specified HOLDREASONNAME SHALL be returned
+### Requirement: Hold History API SHALL keep department endpoint as separate query
+The department endpoint SHALL remain as a separate Oracle query due to its unique person-level aggregation.

-#### Scenario: List unreleased hold records
- **WHEN** a hold record has RELEASETXNDATE IS NULL
- **THEN** releaseDate SHALL be null
- **THEN** holdHours SHALL be calculated as (SYSDATE - HOLDTXNDATE) * 24
-
-#### Scenario: List pagination bounds
- **WHEN** page is less than 1
- **THEN** page SHALL be treated as 1
- **WHEN** per_page exceeds 200
- **THEN** per_page SHALL be capped at 200
-
-#### Scenario: List date range uses shift boundary
- **WHEN** records are filtered by date range
- **THEN** the 07:30 shift boundary rule SHALL be applied to HOLDTXNDATE
-
-### Requirement: Hold History API SHALL use centralized SQL files
-The API SHALL load SQL queries from files in the `src/mes_dashboard/sql/hold_history/` directory.
-
-#### Scenario: SQL file organization
- **WHEN** the hold history service executes a query
- **THEN** the SQL SHALL be loaded from `sql/hold_history/<query_name>.sql`
- **THEN** the following SQL files SHALL exist: `trend.sql`, `reason_pareto.sql`, `duration.sql`, `department.sql`, `list.sql`
-
-#### Scenario: SQL parameterization
- **WHEN** SQL queries are executed
- **THEN** all user-provided parameters (dates, hold_type, reason) SHALL be passed as bind parameters
- **THEN** no string interpolation SHALL be used for user input
-
-### Requirement: Hold History API SHALL apply rate limiting
-The API SHALL apply rate limiting to expensive endpoints.
-
-#### Scenario: Rate limit on list endpoint
- **WHEN** the list endpoint receives excessive requests
- **THEN** rate limiting SHALL be applied using `configured_rate_limit` with a default of 90 requests per 60 seconds
-
-#### Scenario: Rate limit on trend endpoint
- **WHEN** the trend endpoint receives excessive requests
- **THEN** rate limiting SHALL be applied using `configured_rate_limit` with a default of 60 requests per 60 seconds
-
-### Requirement: Hold History page route SHALL serve static Vite HTML
-The Flask route SHALL serve the pre-built Vite HTML file.
-
-#### Scenario: Page route
- **WHEN** user navigates to `/hold-history`
- **THEN** Flask SHALL serve the pre-built HTML file from `static/dist/hold-history.html` via `send_from_directory`
- **THEN** the HTML SHALL NOT pass through Jinja2 template rendering
-
-#### Scenario: Fallback HTML
- **WHEN** the pre-built HTML file does not exist
- **THEN** Flask SHALL return a minimal HTML page with the correct script tag and module import
+#### Scenario: Department endpoint unchanged
+- **WHEN** `GET /api/hold-history/department` is called
+- **THEN** it SHALL continue to execute its own Oracle query
+- **THEN** it SHALL NOT use the dataset cache
--- a/openspec/specs/hold-history-page/spec.md
+++ b/openspec/specs/hold-history-page/spec.md
@@ -1,172 +1,34 @@
-## ADDED Requirements
+## MODIFIED Requirements

 ### Requirement: Hold History page SHALL display a filter bar with date range and hold type
-The page SHALL provide a filter bar for selecting date range and hold type classification.
+The page SHALL provide a filter bar for selecting date range and hold type classification. On query, the page SHALL use a two-phase flow: POST /query returns queryId, subsequent filter changes use GET /view.

-#### Scenario: Default date range
- **WHEN** the page loads
- **THEN** the date range SHALL default to the first and last day of the current month
+#### Scenario: Primary query via POST /query
+- **WHEN** user clicks the query button (or page loads with default filters)
+- **THEN** the page SHALL call `POST /api/hold-history/query` with `{ start_date, end_date, hold_type }`
+- **THEN** the response queryId SHALL be stored for subsequent view requests
+- **THEN** trend, reason-pareto, duration, and list SHALL all be populated from the single response

-#### Scenario: Hold Type radio default
- **WHEN** the page loads
- **THEN** the Hold Type filter SHALL default to "品質異常"
- **THEN** three radio options SHALL display: 品質異常, 非品質異常, 全部
+#### Scenario: Hold type or reason filter change uses GET /view
+- **WHEN** user changes hold_type radio or clicks a reason in the Pareto chart (while queryId exists)
+- **THEN** the page SHALL call `GET /api/hold-history/view?query_id=...&hold_type=...&reason=...`
+- **THEN** no new Oracle query SHALL be triggered
+- **THEN** trend, reason-pareto, duration, and list SHALL update from the view response

-#### Scenario: Filter bar change reloads all data
- **WHEN** user changes the date range or Hold Type selection
- **THEN** all API calls (trend, reason-pareto, duration, department, list) SHALL reload with the new parameters
- **THEN** any active Reason Pareto filter SHALL be cleared
- **THEN** pagination SHALL reset to page 1
+#### Scenario: Pagination uses GET /view
+- **WHEN** user navigates to a different page in the detail list
+- **THEN** the page SHALL call `GET /api/hold-history/view?query_id=...&page=...&per_page=...`

-### Requirement: Hold History page SHALL display summary KPI cards
-The page SHALL show 6 summary KPI cards derived from the trend data for the selected period.
+#### Scenario: Date range change triggers new primary query
+- **WHEN** user changes the date range and clicks query
+- **THEN** the page SHALL call `POST /api/hold-history/query` with new dates
+- **THEN** a new queryId SHALL replace the old one

-#### Scenario: Summary cards rendering
- **WHEN** trend data is loaded
- **THEN** six cards SHALL display: Release 數量, New Hold 數量, Future Hold 數量, 淨變動, 期末 On Hold, 平均 Hold 時長
- **THEN** Release SHALL be displayed as a positive indicator (green)
- **THEN** New Hold and Future Hold SHALL be displayed as negative indicators (red/orange)
- **THEN** 淨變動 SHALL equal Release - New Hold - Future Hold
- **THEN** 期末 On Hold SHALL be the HOLDQTY of the last day in the selected range
- **THEN** number values SHALL use zh-TW number formatting
+#### Scenario: Cache expired auto-retry
+- **WHEN** GET /view returns `{ success: false, error: "cache_expired" }`
+- **THEN** the page SHALL automatically re-execute `POST /api/hold-history/query` with the last committed filters
+- **THEN** the view SHALL refresh with the new data

-#### Scenario: Summary reflects filter bar only
- **WHEN** user clicks a Reason Pareto block
- **THEN** summary cards SHALL NOT change (they only respond to filter bar changes)
-
-### Requirement: Hold History page SHALL display a Daily Trend chart
-The page SHALL display a mixed line+bar chart showing daily hold stock and flow.
-
-#### Scenario: Daily Trend chart rendering
- **WHEN** trend data is loaded
- **THEN** an ECharts mixed chart SHALL display with dual Y-axes
- **THEN** the left Y-axis SHALL show flow quantities (Release, New Hold, Future Hold)
- **THEN** the right Y-axis SHALL show HOLDQTY stock level
- **THEN** the X-axis SHALL show dates within the selected range
-
-#### Scenario: Bar direction encoding
- **WHEN** daily trend bars are rendered
- **THEN** Release bars SHALL extend upward (positive direction, green color)
- **THEN** New Hold bars SHALL extend downward (negative direction, red color)
- **THEN** Future Hold bars SHALL extend downward (negative direction, orange color, stacked with New Hold)
- **THEN** HOLDQTY SHALL display as a line on the right Y-axis
-
-#### Scenario: Hold Type switching without re-call
- **WHEN** user changes the Hold Type radio on the filter bar
- **THEN** if the date range has not changed, the trend chart SHALL update from locally cached data
- **THEN** no additional API call SHALL be made for the trend endpoint
-
-#### Scenario: Daily Trend reflects filter bar only
- **WHEN** user clicks a Reason Pareto block
- **THEN** the Daily Trend chart SHALL NOT change (it only responds to filter bar changes)
-
-### Requirement: Hold History page SHALL display a Reason Pareto chart
-The page SHALL display a Pareto chart showing hold reason distribution.
-
-#### Scenario: Reason Pareto rendering
- **WHEN** reason-pareto data is loaded
- **THEN** a Pareto chart SHALL display with bars (count per reason) and a cumulative percentage line
- **THEN** reasons SHALL be sorted by count descending
- **THEN** the cumulative line SHALL reach 100% at the rightmost bar
-
-#### Scenario: Reason Pareto click filters downstream
- **WHEN** user clicks a reason bar in the Pareto chart
- **THEN** `reasonFilter` SHALL be set to the clicked reason name
- **THEN** Department table SHALL reload filtered by that reason
- **THEN** Detail table SHALL reload filtered by that reason
- **THEN** the clicked bar SHALL show a visual highlight
-
-#### Scenario: Reason Pareto click toggle
- **WHEN** user clicks the same reason bar that is already active
- **THEN** `reasonFilter` SHALL be cleared
- **THEN** Department table and Detail table SHALL reload without reason filter
-
-#### Scenario: Reason Pareto reflects filter bar only
- **WHEN** user clicks a reason bar
- **THEN** Summary KPIs, Daily Trend, and Duration chart SHALL NOT change
-
-### Requirement: Hold History page SHALL display Hold Duration distribution
-The page SHALL display a horizontal bar chart showing hold duration distribution.
-
-#### Scenario: Duration chart rendering
- **WHEN** duration data is loaded
- **THEN** a horizontal bar chart SHALL display with 4 buckets: <4h, 4-24h, 1-3天, >3天
- **THEN** each bar SHALL show count and percentage
- **THEN** only released holds (RELEASETXNDATE IS NOT NULL) SHALL be included
-
-#### Scenario: Duration reflects filter bar only
- **WHEN** user clicks a Reason Pareto block
- **THEN** the Duration chart SHALL NOT change (it only responds to filter bar changes)
-
-### Requirement: Hold History page SHALL display Department statistics with expandable rows
-The page SHALL display a table showing hold/release statistics per department, expandable to show individual persons.
-
-#### Scenario: Department table rendering
- **WHEN** department data is loaded
- **THEN** a table SHALL display with columns: 部門, Hold 次數, Release 次數, 平均 Hold 時長(hr)
- **THEN** departments SHALL be sorted by Hold 次數 descending
- **THEN** each department row SHALL have an expand toggle
-
-#### Scenario: Department row expansion
- **WHEN** user clicks the expand toggle on a department row
- **THEN** individual person rows SHALL display below the department row
- **THEN** person rows SHALL show: 人員名稱, Hold 次數, Release 次數, 平均 Hold 時長(hr)
-
-#### Scenario: Department table responds to reason filter
- **WHEN** a Reason Pareto filter is active
- **THEN** department data SHALL reload filtered by the selected reason
- **THEN** only holds matching the reason SHALL be included in statistics
-
-### Requirement: Hold History page SHALL display paginated Hold/Release detail list
-The page SHALL display a detailed list of individual hold/release records with server-side pagination.
-
-#### Scenario: Detail table columns
- **WHEN** detail data is loaded
- **THEN** a table SHALL display with columns: Lot ID, WorkOrder, 站別, Hold Reason, Hold 時間, Hold 人員, Hold Comment, Release 時間, Release 人員, Release Comment, 時長(hr), NCR
-
-#### Scenario: Unreleased hold display
- **WHEN** a hold record has RELEASETXNDATE IS NULL
- **THEN** the Release 時間 column SHALL display "仍在 Hold"
- **THEN** the 時長 column SHALL display the duration from HOLDTXNDATE to current time
-
-#### Scenario: Detail table pagination
- **WHEN** total records exceed per_page (50)
- **THEN** Prev/Next buttons and page info SHALL display
- **THEN** page info SHALL show "顯示 {start} - {end} / {total}"
-
-#### Scenario: Detail table responds to reason filter
- **WHEN** a Reason Pareto filter is active
- **THEN** detail data SHALL reload filtered by the selected reason
- **THEN** pagination SHALL reset to page 1
-
-#### Scenario: Filter changes reset pagination
- **WHEN** any filter changes (filter bar or Reason Pareto click)
- **THEN** pagination SHALL reset to page 1
-
-### Requirement: Hold History page SHALL display active filter indicator
-The page SHALL show a clear indicator when a Reason Pareto filter is active.
-
-#### Scenario: Reason filter indicator
- **WHEN** a reason filter is active
- **THEN** a filter indicator SHALL display above the Department table section
- **THEN** the indicator SHALL show the active reason name
- **THEN** a clear button (✕) SHALL remove the reason filter
-
-### Requirement: Hold History page SHALL handle loading and error states
-The page SHALL display appropriate feedback during API calls and on errors.
-
-#### Scenario: Initial loading overlay
- **WHEN** the page first loads
- **THEN** a full-page loading overlay SHALL display until all data is loaded
-
-#### Scenario: API error handling
- **WHEN** an API call fails
- **THEN** an error banner SHALL display with the error message
- **THEN** the page SHALL NOT crash or become unresponsive
-
-### Requirement: Hold History page SHALL have navigation links
-The page SHALL provide navigation to related pages.
-
-#### Scenario: Back to Hold Overview
- **WHEN** user clicks the "← Hold Overview" button in the header
- **THEN** the page SHALL navigate to `/hold-overview`
+#### Scenario: Department still uses separate API
+- **WHEN** department data needs to load or reload
+- **THEN** the page SHALL call `GET /api/hold-history/department` separately
--- a/openspec/specs/resource-dataset-cache/spec.md
+++ b/openspec/specs/resource-dataset-cache/spec.md
@@ -0,0 +1,71 @@
+## ADDED Requirements
+
+### Requirement: Resource dataset cache SHALL execute a single Oracle query and cache the result
+The resource_dataset_cache module SHALL query Oracle once for the full shift-status fact set and cache it for subsequent derivations.
+
+#### Scenario: Primary query execution and caching
+- **WHEN** `execute_primary_query()` is called with date range, granularity, and resource filter parameters
+- **THEN** a deterministic `query_id` SHALL be computed from all primary params using SHA256
+- **THEN** if a cached DataFrame exists for this query_id (L1 or L2), it SHALL be used without querying Oracle
+- **THEN** if no cache exists, a single Oracle query SHALL fetch all shift-status records from `DW_MES_RESOURCESTATUS_SHIFT` for the filtered resources and date range
+- **THEN** the result DataFrame SHALL be stored in both L1 (ProcessLevelCache) and L2 (Redis as parquet/base64)
+- **THEN** the response SHALL include `query_id`, summary (KPI, trend, heatmap, comparison), and detail page 1
+
+#### Scenario: Cache TTL and eviction
+- **WHEN** a DataFrame is cached
+- **THEN** the cache TTL SHALL be 900 seconds (15 minutes)
+- **THEN** L1 cache max_size SHALL be 8 entries with LRU eviction
+- **THEN** the Redis namespace SHALL be `resource_dataset`
+
+### Requirement: Resource dataset cache SHALL derive KPI summary from cached DataFrame
+The module SHALL compute aggregated KPI metrics from the cached fact set.
+
+#### Scenario: KPI derivation from cache
+- **WHEN** summary view is derived from cached DataFrame
+- **THEN** total hours for PRD, SBY, UDT, SDT, EGT, NST SHALL be summed
+- **THEN** OU% and AVAIL% SHALL be computed from the hour totals
+- **THEN** machine count SHALL be the distinct count of HISTORYID in the cached data
+
+### Requirement: Resource dataset cache SHALL derive trend data from cached DataFrame
+The module SHALL compute time-series aggregations from the cached fact set.
+
+#### Scenario: Trend derivation
+- **WHEN** summary view is derived with a given granularity (day/week/month/year)
+- **THEN** the cached DataFrame SHALL be grouped by the granularity period
+- **THEN** each period SHALL include PRD, SBY, UDT, SDT, EGT, NST hours and computed OU%, AVAIL%
+
+### Requirement: Resource dataset cache SHALL derive heatmap from cached DataFrame
+The module SHALL compute workcenter x date OU% matrix from the cached fact set.
+
+#### Scenario: Heatmap derivation
+- **WHEN** summary view is derived
+- **THEN** the cached DataFrame SHALL be grouped by (workcenter, date)
+- **THEN** each cell SHALL contain the OU% for that workcenter on that date
+- **THEN** workcenters SHALL be sorted by workcenter_seq
+
+### Requirement: Resource dataset cache SHALL derive workcenter comparison from cached DataFrame
+The module SHALL compute per-workcenter aggregated metrics from the cached fact set.
+
+#### Scenario: Comparison derivation
+- **WHEN** summary view is derived
+- **THEN** the cached DataFrame SHALL be grouped by workcenter
+- **THEN** each workcenter SHALL include total hours and computed OU%
+- **THEN** results SHALL be sorted by OU% descending, limited to top 15
+
+### Requirement: Resource dataset cache SHALL derive paginated detail from cached DataFrame
+The module SHALL provide hierarchical detail records from the cached fact set.
+
+#### Scenario: Detail derivation and pagination
+- **WHEN** detail view is requested with page and per_page parameters
+- **THEN** the cached DataFrame SHALL be used to compute per-resource metrics
+- **THEN** resource dimension data (WORKCENTERNAME, RESOURCEFAMILYNAME) SHALL be merged from resource_cache
+- **THEN** results SHALL be structured as a hierarchical tree (workcenter -> family -> resource)
+- **THEN** pagination SHALL apply to the flattened list
+
+### Requirement: Resource dataset cache SHALL handle cache expiry gracefully
+The module SHALL return appropriate signals when cache has expired.
+
+#### Scenario: Cache expired during view request
+- **WHEN** a view is requested with a query_id whose cache has expired
+- **THEN** the response SHALL return `{ success: false, error: "cache_expired" }`
+- **THEN** the HTTP status SHALL be 410 (Gone)
--- a/openspec/specs/resource-history-page/spec.md
+++ b/openspec/specs/resource-history-page/spec.md
@@ -1,139 +1,46 @@
-## ADDED Requirements
-
-### Requirement: Resource History page SHALL display KPI summary cards
-The page SHALL show 9 KPI cards with aggregated performance metrics for the queried period.
-
-#### Scenario: KPI cards rendering
- **WHEN** summary data is loaded from `GET /api/resource/history/summary`
- **THEN** 9 cards SHALL display: OU%, AVAIL%, PRD, SBY, UDT, SDT, EGT, NST, Machine Count
- **THEN** hour values SHALL format with "K" suffix for large numbers (e.g., 2.5K)
- **THEN** percentage values SHALL use `buildResourceKpiFromHours()` from `core/compute.js`
-
-### Requirement: Resource History page SHALL display trend chart
-The page SHALL show OU% and Availability% trends over time.
-
-#### Scenario: Trend chart rendering
- **WHEN** summary data is loaded
- **THEN** a line chart with area fill SHALL display OU% and AVAIL% time series
- **THEN** the chart SHALL use vue-echarts with `autoresize` prop
- **THEN** smooth curves with 0.2 opacity area style SHALL render
-
-### Requirement: Resource History page SHALL display stacked status distribution chart
-The page SHALL show E10 status hour distribution over time.
-
-#### Scenario: Stacked bar chart rendering
- **WHEN** summary data is loaded
- **THEN** a stacked bar chart SHALL display PRD, SBY, UDT, SDT, EGT, NST hours per period
- **THEN** each status SHALL use its designated color (PRD=green, SBY=blue, UDT=red, SDT=yellow, EGT=purple, NST=gray)
- **THEN** tooltips SHALL show percentages calculated dynamically
-
-### Requirement: Resource History page SHALL display workcenter comparison chart
-The page SHALL show top workcenters ranked by OU%.
-
-#### Scenario: Comparison chart rendering
- **WHEN** summary data is loaded
- **THEN** a horizontal bar chart SHALL display top 15 workcenters by OU%
- **THEN** bars SHALL be color-coded: green (≥80%), yellow (≥50%), red (<50%)
- **THEN** data SHALL display in descending OU% order (top to bottom)
-
-### Requirement: Resource History page SHALL display OU% heatmap
-The page SHALL show a heatmap of OU% by workcenter and date.
-
-#### Scenario: Heatmap chart rendering
- **WHEN** summary data is loaded
- **THEN** a 2D heatmap SHALL display: workcenters (Y-axis) × dates (X-axis)
- **THEN** color scale SHALL range from red (low OU%) through yellow to green (high OU%)
- **THEN** workcenters SHALL sort by `workcenter_seq` for consistent ordering
-
-### Requirement: Resource History page SHALL display hierarchical detail table
-The page SHALL show a three-level expandable table with per-resource performance metrics.
-
-#### Scenario: Detail table rendering
- **WHEN** detail data is loaded from `GET /api/resource/history/detail`
- **THEN** a tree table SHALL display with columns: Name, OU%, AVAIL%, PRD, SBY, UDT, SDT, EGT, NST, Count
- **THEN** Level 0 rows SHALL show workcenter groups with aggregated metrics
- **THEN** Level 1 rows SHALL show resource families with aggregated metrics
- **THEN** Level 2 rows SHALL show individual resources
-
-#### Scenario: Hour and percentage display
- **WHEN** detail data renders
- **THEN** status columns SHALL display hours with percentage: "10.5h (25%)"
- **THEN** KPI values SHALL be computed using `buildResourceKpiFromHours()` from `core/compute.js`
-
-#### Scenario: Tree expand and collapse
- **WHEN** user clicks the expand button on a row
- **THEN** child rows SHALL toggle visibility
- **WHEN** user clicks "Expand All" or "Collapse All"
- **THEN** all rows SHALL expand or collapse accordingly
+## MODIFIED Requirements

 ### Requirement: Resource History page SHALL support date range and granularity selection
-The page SHALL allow users to specify time range and aggregation granularity.
+The page SHALL allow users to specify time range and aggregation granularity. On query, the page SHALL use a two-phase flow: POST /query returns queryId, subsequent filter changes use GET /view.

-#### Scenario: Date range selection
- **WHEN** the page loads
- **THEN** date inputs SHALL default to last 7 days (yesterday minus 6 days)
- **THEN** date range SHALL NOT exceed 730 days (2 years)
-
-#### Scenario: Granularity buttons
- **WHEN** user clicks a granularity button (日/週/月/年)
- **THEN** the active button SHALL highlight
- **THEN** the next query SHALL use the selected granularity (day/week/month/year)
-
-#### Scenario: Query execution
+#### Scenario: Primary query via POST /query
 - **WHEN** user clicks the query button
- **THEN** summary and detail APIs SHALL be called in parallel
- **THEN** all 4 charts, KPI cards, and detail table SHALL update with results
+- **THEN** the page SHALL call `POST /api/resource/history/query` with date range, granularity, and resource filters
+- **THEN** the response queryId SHALL be stored for subsequent view requests
+- **THEN** summary (KPI, trend, heatmap, comparison) and detail page 1 SHALL all be populated from the single response

-### Requirement: Resource History page SHALL support multi-select filtering
-The page SHALL provide multi-select dropdown filters for workcenter groups and families, and SHALL support interdependent narrowing with machine options and selected-value pruning.
+#### Scenario: Filter change uses GET /view
+- **WHEN** user changes supplementary filters (workcenter groups, families, machines, equipment type) while queryId exists
+- **THEN** the page SHALL call `GET /api/resource/history/view?query_id=...&filters...`
+- **THEN** no new Oracle query SHALL be triggered
+- **THEN** all charts, KPI cards, and detail table SHALL update from the view response

-#### Scenario: Multi-select dropdown
- **WHEN** user clicks a multi-select dropdown trigger
- **THEN** a dropdown SHALL display with checkboxes for each option
- **THEN** "Select All" and "Clear All" buttons SHALL be available
- **THEN** clicking outside the dropdown SHALL close it
+#### Scenario: Pagination uses GET /view
+- **WHEN** user navigates to a different page in the detail table
+- **THEN** the page SHALL call `GET /api/resource/history/view?query_id=...&page=...`

-#### Scenario: Filter options loading
- **WHEN** the page loads
- **THEN** workcenter groups and families SHALL load from `GET /api/resource/history/options`
- **THEN** machine candidates SHALL be derivable before first query from loaded option resources
-
-#### Scenario: Upstream filters narrow downstream options
- **WHEN** user changes upstream filters (`workcenterGroups`, `families`, equipment-type flags)
- **THEN** machine options SHALL be recomputed to only include matching resources
- **THEN** narrowed options SHALL be reflected immediately in filter controls
-
-#### Scenario: Invalid selected machines are pruned
- **WHEN** upstream filters change and selected machines are no longer valid
- **THEN** invalid selected machine values SHALL be removed automatically
- **THEN** remaining valid selected machine values SHALL be preserved
-
-#### Scenario: Equipment type checkboxes
- **WHEN** user toggles a checkbox (生產設備, 重點設備, 監控設備)
- **THEN** the next query SHALL include the corresponding filter parameter
- **THEN** option narrowing SHALL also honor the same checkbox conditions
+#### Scenario: Date range or granularity change triggers new primary query
+- **WHEN** user changes date range or granularity and clicks query
+- **THEN** the page SHALL call `POST /api/resource/history/query` with new params
+- **THEN** a new queryId SHALL replace the old one

-### Requirement: Resource History page SHALL support CSV export
-The page SHALL allow users to export the current query results as CSV.
+#### Scenario: Cache expired auto-retry
+- **WHEN** GET /view returns `{ success: false, error: "cache_expired" }`
+- **THEN** the page SHALL automatically re-execute `POST /api/resource/history/query` with the last committed filters
+- **THEN** the view SHALL refresh with the new data

-#### Scenario: CSV export
- **WHEN** user clicks the "匯出 CSV" button
- **THEN** the browser SHALL download a CSV file from `GET /api/resource/history/export` with current filters
- **THEN** the filename SHALL be `resource_history_{start_date}_to_{end_date}.csv`
+### Requirement: Resource History page SHALL display KPI summary cards
+The page SHALL show 9 KPI cards with aggregated performance metrics derived from the cached dataset.

-### Requirement: Resource History page SHALL handle loading and error states
-The page SHALL display appropriate feedback during API calls and on errors.
+#### Scenario: KPI cards from cached data
+- **WHEN** summary data is derived from the cached DataFrame
+- **THEN** 9 cards SHALL display: OU%, AVAIL%, PRD, SBY, UDT, SDT, EGT, NST, Machine Count
+- **THEN** values SHALL be computed from the cached shift-status records, not from a separate Oracle query

-#### Scenario: Query loading state
- **WHEN** a query is executing
- **THEN** the query button SHALL be disabled
- **THEN** a loading indicator SHALL display
+### Requirement: Resource History page SHALL display hierarchical detail table
+The page SHALL show a three-level expandable table derived from the cached dataset.

-#### Scenario: API error handling
- **WHEN** an API call fails
- **THEN** a toast notification SHALL display the error message
- **THEN** the page SHALL NOT crash or become unresponsive
-
-#### Scenario: No data placeholder
- **WHEN** query returns empty results
- **THEN** charts and table SHALL display "No data" placeholders
+#### Scenario: Detail table from cached data
+- **WHEN** detail data is derived from the cached DataFrame
+- **THEN** a tree table SHALL display with the same columns and hierarchy as before
+- **THEN** data SHALL be derived in-memory from the cached DataFrame, not from a separate Oracle query