feat(admin-performance): Vue 3 SPA dashboard with metrics history trending

Rebuild /admin/performance from Jinja2 to Vue 3 SPA with ECharts, adding cache telemetry infrastructure, connection pool monitoring, and SQLite-backed historical metrics collection with trend chart visualization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 09:18:10 +08:00
parent 1c46f5eb69
commit 5d570ca7a2
32 changed files with 2903 additions and 261 deletions
--- a/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/.openspec.yaml
+++ b/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/.openspec.yaml
@@ -0,0 +1,2 @@
+schema: spec-driven
+created: 2026-02-22
--- a/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/design.md
+++ b/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/design.md
@@ -0,0 +1,91 @@
+## Context
+
+現有 `/admin/performance` 是 Jinja2 server-rendered 頁面（vanilla JS + Chart.js），是唯一未遷移至 Vue 3 SPA 的前端頁面。後端已具備豐富的監控數據（連線池 `get_pool_status()`、Redis client、LayeredCache `.telemetry()`），但前端僅展示 4 張 status cards + query performance + worker control + logs，缺少 Redis 詳情、ProcessLevelCache 統計、連線池飽和度等關鍵面板。
+
+## Goals / Non-Goals
+
+**Goals:**
+- 將 admin/performance 頁面從 Jinja2 切換為 Vue 3 SPA，與所有報表頁面架構一致
+- 新增完整的系統監控面板：Redis 快取詳情、ProcessLevelCache 統計、連線池飽和度、直連 Oracle 追蹤
+- 提供可複用的 gauge/stat card 組件，便於未來擴展監控項目
+- 保留所有既有功能（status cards、query performance、worker control、system logs）
+
+**Non-Goals:**
+- 不新增告警/通知機制（未來可擴展）
+- 不引入 WebSocket 即時推送（維持 30 秒輪詢）
+- 不修改既有 API response format（`system-status`、`metrics`、`logs` 保持不變）
+- 不新增使用者權限控制（沿用既有 admin 認證）
+
+## Decisions
+
+### 1. Vue 3 SPA + ECharts 取代 Jinja2 + Chart.js
+
+**選擇**: 全面重建為 Vue 3 SPA，使用 ECharts 繪製圖表
+
+**理由**: 所有報表頁面已完成 Vue SPA 遷移，admin/performance 是最後一個 Jinja2 頁面。統一架構可複用 `apiGet`、`useAutoRefresh` 等共用基礎設施，減少維護成本。ECharts 已是專案標準圖表庫（query-tool、reject-history 等均使用）。
+
+**替代方案**: 保留 Jinja2 僅加 API — 但會持續累積技術債，且無法複用 Vue 生態。
+
+### 2. 單一 performance-detail API 聚合所有新增監控數據
+
+**選擇**: 新增 `GET /admin/api/performance-detail` 一個 endpoint，回傳 `redis`、`process_caches`、`route_cache`、`db_pool`、`direct_connections` 五個 section。
+
+**理由**: 減少前端並發請求數（已有 5 個 API，加 1 個共 6 個），後端可在同一 request 中順序收集各子系統狀態，避免多次 round-trip。
+
+**替代方案**: 每個監控維度獨立 endpoint — 更 RESTful 但增加前端複雜度和網路開銷。
+
+### 3. ProcessLevelCache 全域 registry 模式
+
+**選擇**: 在 `core/cache.py` 新增 `_PROCESS_CACHE_REGISTRY` dict + `register_process_cache()` 函式，各服務在模組載入時自行註冊。
+
+**理由**: 避免 admin_routes 硬編碼各快取實例的 import 路徑，新增快取時只需在該服務中加一行 `register_process_cache()` 即可自動出現在監控面板。
+
+**替代方案**: admin_routes 直接 import 各快取實例 — 耦合度高，新增快取需改兩處。
+
+### 4. Redis namespace 監控使用 SCAN 而非 KEYS
+
+**選擇**: 使用 `SCAN` 搭配 `MATCH` pattern 掃描各 namespace 的 key 數量。
+
+**理由**: `KEYS *` 在生產環境會阻塞 Redis，`SCAN` 為非阻塞迭代器，安全性更高。
+
+### 5. 直連 Oracle 使用 thread-safe atomic counter
+
+**選擇**: 在 `database.py` 使用 `threading.Lock` 保護的全域計數器，在 `get_db_connection()` 和 `read_sql_df_slow()` 建立連線後 increment。
+
+**理由**: 追蹤連線池外的直接連線使用量，幫助判斷是否需要調整池大小。計數器為 monotonic（只增不減），記錄的是自 worker 啟動以來的總數。
+
+### 6. 前端組件複用 GaugeBar / StatCard / StatusDot
+
+**選擇**: 新增 3 個小型可複用組件放在 `admin-performance/components/` 下。
+
+**理由**: Redis 記憶體、連線池飽和度、ProcessLevelCache 使用率等多處需要 gauge 視覺化；status cards 跨面板重複。組件化可統一視覺風格並減少重複 template。
+
+### 7. SQLite 持久化 metrics history store
+
+**選擇**: 新增 `core/metrics_history.py`，使用 SQLite 儲存 metrics snapshots（仿 `core/log_store.py` 的 `LogStore` 模式），搭配 daemon thread 每 30 秒採集一次。
+
+**理由**: in-memory deque 在 worker 重啟或 gunicorn prefork 下無法跨 worker 共享且不保留歷史。SQLite 提供跨 worker 讀取、重啟持久化、可配置保留天數（預設 3 天 / 50000 rows），且不需額外 infra。
+
+**替代方案**:
+- in-memory deque — 簡單但 worker 獨立、重啟即失
+- Redis TSDB — 需額外模組且增加 Redis 負擔
+- PostgreSQL — 太重，且此數據不需 ACID
+
+**Schema**: `metrics_snapshots` table 含 timestamp、worker PID、pool/redis/route_cache/latency 各欄位，`idx_metrics_ts` 索引加速時間查詢。
+
+**背景採集**: `MetricsHistoryCollector` daemon thread，間隔可透過 `METRICS_HISTORY_INTERVAL` 環境變數配置。在 `app.py` lifecycle 中 start/stop。
+
+## Risks / Trade-offs
+
+- **Redis SCAN 效能**: 大量 key 時 SCAN 可能較慢 → 設定 `COUNT 100` 限制每次迭代量，且 30 秒才掃一次，可接受
+- **ProcessLevelCache registry 依賴模組載入順序**: 服務未 import 時不會註冊 → 在 app factory 或 gunicorn post_fork 確保所有服務模組已載入
+- **直連計數器跨 worker 不共享**: gunicorn prefork 模式下每個 worker 有獨立計數 → API 回傳當前 worker PID 供辨識，可透過 `/admin/api/system-status` 的 worker info 交叉比對
+- **舊 Jinja2 模板保留但不維護**: 切換後舊模板不再更新 → 透過 `routeContracts.js` 的 `rollbackStrategy: 'fallback_to_legacy_route'` 保留回退能力
+
+## Migration Plan
+
+1. 後端先行：加 `stats()`、registry、直連計數器、新 API（不影響既有功能）
+2. 前端建構：新建 `admin-performance/` Vue SPA，Vite 註冊 entry
+3. 路由切換：`admin_routes.py` 改為 `send_from_directory`，`routeContracts.js` 改 `renderMode: 'native'`
+4. 驗證後部署：確認所有面板正確顯示後上線
+5. 回退方案：`routeContracts.js` 改回 `renderMode: 'external'`，`admin_routes.py` 改回 `render_template`
--- a/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/proposal.md
+++ b/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/proposal.md
@@ -0,0 +1,31 @@
+## Why
+
+現有 `/admin/performance` 是唯一仍使用 Jinja2 + vanilla JS + Chart.js 的頁面，與所有已遷移至 Vue 3 SPA 的報表頁面架構不一致。同時，隨著報表系統功能擴充（L1/L2 快取層、連線池、直連 Oracle 等），後端已具備豐富的遙測數據，但管理後台的監控面板覆蓋不足——缺少 Redis 詳情、ProcessLevelCache 統計、連線池飽和度、直連 Oracle 追蹤等關鍵資訊。
+
+## What Changes
+
+- 將 `/admin/performance` 從 Jinja2 server-rendered 頁面重建為 Vue 3 SPA（ECharts 取代 Chart.js）
+- 新增 `GET /admin/api/performance-detail` API，整合 Redis INFO/SCAN、ProcessLevelCache registry、連線池狀態、直連計數等完整監控數據
+- 後端 `ProcessLevelCache` 加入 `stats()` 方法與全域 registry，支援動態收集所有快取實例狀態
+- 後端 `database.py` 加入直連 Oracle 計數器，追蹤非連線池的直接連線使用量
+- 前端新增 GaugeBar / StatCard / StatusDot 可複用組件，提供 gauge 飽和度視覺化
+- portal-shell 路由從 `renderMode: 'external'` 切換為 `'native'`
+- Vite 構建新增 `admin-performance` entry point
+
+## Capabilities
+
+### New Capabilities
+- `admin-performance-spa`: Vue 3 SPA 重建管理效能儀表板，包含 status cards、query performance、Redis 快取、記憶體快取、連線池、worker 控制、系統日誌等完整面板
+- `cache-telemetry-api`: ProcessLevelCache stats() + 全域 registry + performance-detail API，提供所有記憶體快取、Redis 快取、route cache 的遙測數據
+- `connection-pool-monitoring`: 連線池飽和度追蹤 + 直連 Oracle 計數器，完整呈現資料庫連線使用狀況
+- `metrics-history-trending`: SQLite 持久化背景採集 + 時間序列趨勢圖，可回溯連線池飽和度、查詢延遲、Redis 記憶體、快取命中率等歷史數據
+
+### Modified Capabilities
+<!-- No existing spec-level requirements are changing -->
+
+## Impact
+
+- **Backend** (7 files): `core/cache.py`、`core/database.py`、`core/metrics_history.py`(NEW)、`routes/admin_routes.py`、`services/resource_cache.py`、`services/realtime_equipment_cache.py`、`services/reject_dataset_cache.py`、`app.py`
+- **Frontend** (8 new + 3 modified): 新建 `admin-performance/` 目錄（index.html、main.js、App.vue、style.css、4 個組件含 TrendChart），修改 `vite.config.js`、`package.json`、`routeContracts.js`
+- **API**: 新增 2 個 endpoint (`/admin/api/performance-detail`、`/admin/api/performance-history`)，既有 5 個 endpoint 不變
+- **Rollback**: 舊 Jinja2 模板保留，可透過 `routeContracts.js` 切回 `renderMode: 'external'`
--- a/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/specs/admin-performance-spa/spec.md
+++ b/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/specs/admin-performance-spa/spec.md
@@ -0,0 +1,100 @@
+## ADDED Requirements
+
+### Requirement: Vue 3 SPA page replaces Jinja2 template
+The `/admin/performance` route SHALL serve a Vue 3 SPA page built by Vite, replacing the existing Jinja2 server-rendered template. The SPA SHALL be registered as a Vite entry point and integrated into the portal-shell navigation as a `renderMode: 'native'` route.
+
+#### Scenario: Page loads as Vue SPA
+- **WHEN** user navigates to `/admin/performance`
+- **THEN** the server SHALL return the Vite-built `admin-performance.html` static file (not a Jinja2 rendered template)
+
+#### Scenario: Portal-shell integration
+- **WHEN** the portal-shell renders `/admin/performance`
+- **THEN** it SHALL load the page as a native Vue SPA (not an external iframe)
+
+### Requirement: Status cards display system health
+The dashboard SHALL display 4 status cards in a horizontal grid: Database, Redis, Circuit Breaker, and Worker PID. Each card SHALL show a StatusDot indicator (healthy/degraded/error/disabled) with the current status value.
+
+#### Scenario: All systems healthy
+- **WHEN** all backend systems report healthy status via `/admin/api/system-status`
+- **THEN** all 4 status cards SHALL display green StatusDot indicators with their respective values
+
+#### Scenario: Redis disabled
+- **WHEN** Redis is disabled (`REDIS_ENABLED=false`)
+- **THEN** the Redis status card SHALL display a disabled StatusDot indicator and the Redis cache panel SHALL show a graceful degradation message
+
+### Requirement: Query performance panel with ECharts
+The dashboard SHALL display query performance metrics (P50, P95, P99 latencies, total queries, slow queries) and an ECharts latency distribution chart, replacing the existing Chart.js implementation.
+
+#### Scenario: Metrics loaded successfully
+- **WHEN** `/admin/api/metrics` returns valid performance data
+- **THEN** the panel SHALL display P50/P95/P99 latency values and render an ECharts bar chart showing latency distribution
+
+#### Scenario: No metrics data
+- **WHEN** `/admin/api/metrics` returns empty or null metrics
+- **THEN** the panel SHALL display placeholder text indicating no data available
+
+### Requirement: Redis cache detail panel
+The dashboard SHALL display a Redis cache detail panel showing memory usage (as a GaugeBar), connected clients, hit rate percentage, peak memory, and a namespace key distribution table.
+
+#### Scenario: Redis active with data
+- **WHEN** `/admin/api/performance-detail` returns Redis data with namespace key counts
+- **THEN** the panel SHALL display a memory GaugeBar, hit rate, client count, and a table listing each namespace with its key count
+
+#### Scenario: Redis disabled
+- **WHEN** Redis is disabled
+- **THEN** the Redis detail panel SHALL display a disabled state message without errors
+
+### Requirement: Memory cache panel
+The dashboard SHALL display ProcessLevelCache statistics as grid cards (showing entries/max_size as a mini gauge and TTL) plus Route Cache telemetry (L1 hit rate, L2 hit rate, miss rate, total reads).
+
+#### Scenario: Multiple caches registered
+- **WHEN** `/admin/api/performance-detail` returns process_caches with multiple entries
+- **THEN** the panel SHALL render one card per cache instance showing entries, max_size, TTL, and description
+
+#### Scenario: Route cache telemetry
+- **WHEN** `/admin/api/performance-detail` returns route_cache data
+- **THEN** the panel SHALL display L1 hit rate, L2 hit rate, miss rate, and total reads
+
+### Requirement: Connection pool panel
+The dashboard SHALL display connection pool saturation as a GaugeBar and stat cards showing checked_out, checked_in, overflow, max_capacity, pool_size, pool_recycle, pool_timeout, and direct connection count.
+
+#### Scenario: Pool under normal load
+- **WHEN** pool saturation is below 80%
+- **THEN** the GaugeBar SHALL display in a normal color (green/blue)
+
+#### Scenario: Pool near saturation
+- **WHEN** pool saturation exceeds 80%
+- **THEN** the GaugeBar SHALL display in a warning color (yellow/orange/red)
+
+### Requirement: Worker control panel
+The dashboard SHALL display worker PID, uptime, cooldown status, and provide a restart button with a confirmation modal.
+
+#### Scenario: Restart worker
+- **WHEN** user clicks the restart button and confirms in the modal
+- **THEN** the system SHALL POST to `/admin/api/worker/restart` and display the result
+
+#### Scenario: Restart during cooldown
+- **WHEN** worker is in cooldown period
+- **THEN** the restart button SHALL be disabled with a cooldown indicator
+
+### Requirement: System logs panel with filtering and pagination
+The dashboard SHALL display system logs with level filtering, text search, and pagination controls.
+
+#### Scenario: Filter by log level
+- **WHEN** user selects a specific log level filter
+- **THEN** only logs matching that level SHALL be displayed
+
+#### Scenario: Paginate logs
+- **WHEN** logs exceed the page size
+- **THEN** pagination controls SHALL allow navigating between pages
+
+### Requirement: Auto-refresh with toggle
+The dashboard SHALL auto-refresh all panels every 30 seconds using `useAutoRefresh`. The user SHALL be able to toggle auto-refresh on/off and manually trigger a refresh.
+
+#### Scenario: Auto-refresh enabled
+- **WHEN** auto-refresh is enabled (default)
+- **THEN** all panels SHALL refresh their data every 30 seconds via `Promise.all` parallel fetch
+
+#### Scenario: Manual refresh
+- **WHEN** user clicks the manual refresh button
+- **THEN** all panels SHALL immediately refresh their data
--- a/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/specs/cache-telemetry-api/spec.md
+++ b/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/specs/cache-telemetry-api/spec.md
@@ -0,0 +1,56 @@
+## ADDED Requirements
+
+### Requirement: ProcessLevelCache stats method
+Every `ProcessLevelCache` instance SHALL expose a `stats()` method that returns a dict containing `entries` (live entries count), `max_size`, and `ttl_seconds`.
+
+#### Scenario: Stats on active cache
+- **WHEN** `stats()` is called on a ProcessLevelCache with 5 live entries (max_size=32, ttl=30s)
+- **THEN** it SHALL return `{"entries": 5, "max_size": 32, "ttl_seconds": 30}`
+
+#### Scenario: Stats with expired entries
+- **WHEN** `stats()` is called and some entries have exceeded TTL
+- **THEN** `entries` SHALL only count entries where `now - timestamp <= ttl`
+
+#### Scenario: Thread safety
+- **WHEN** `stats()` is called concurrently with cache writes
+- **THEN** it SHALL acquire the cache lock and return consistent data without races
+
+### Requirement: ProcessLevelCache global registry
+The system SHALL maintain a module-level registry in `core/cache.py` that maps cache names to `(description, instance)` tuples. Services SHALL register their cache instances at module load time via `register_process_cache(name, instance, description)`.
+
+#### Scenario: Register and retrieve all caches
+- **WHEN** multiple services register their caches and `get_all_process_cache_stats()` is called
+- **THEN** it SHALL return a dict of `{name: {entries, max_size, ttl_seconds, description}}` for all registered caches
+
+#### Scenario: Cache not registered
+- **WHEN** a service's ProcessLevelCache is not registered
+- **THEN** it SHALL NOT appear in `get_all_process_cache_stats()` output
+
+### Requirement: Performance detail API endpoint
+The system SHALL expose `GET /admin/api/performance-detail` that returns a JSON object with sections: `redis`, `process_caches`, `route_cache`, `db_pool`, and `direct_connections`.
+
+#### Scenario: All systems available
+- **WHEN** the API is called and all subsystems are healthy
+- **THEN** it SHALL return all 5 sections with current telemetry data
+
+#### Scenario: Redis disabled
+- **WHEN** Redis is disabled (`REDIS_ENABLED=false`)
+- **THEN** the `redis` section SHALL be `null` or contain `{"enabled": false}`, and other sections SHALL still return normally
+
+### Requirement: Redis namespace key distribution
+The performance-detail API SHALL scan Redis keys by namespace prefix and return key counts per namespace. Namespaces SHALL include: `data`, `route_cache`, `equipment_status`, `reject_dataset`, `meta`, `lock`, `scrap_exclusion`.
+
+#### Scenario: Keys exist across namespaces
+- **WHEN** Redis contains keys across multiple namespaces
+- **THEN** the `redis.namespaces` array SHALL list each namespace with its `name` and `key_count`
+
+#### Scenario: SCAN safety
+- **WHEN** scanning Redis keys
+- **THEN** the system SHALL use `SCAN` (not `KEYS`) to avoid blocking Redis
+
+### Requirement: Route cache telemetry in performance detail
+The performance-detail API SHALL include route cache telemetry from `get_route_cache_status()`, providing `mode`, `l1_size`, `l1_hit_rate`, `l2_hit_rate`, `miss_rate`, and `reads_total`.
+
+#### Scenario: LayeredCache active
+- **WHEN** route cache is in layered mode
+- **THEN** the `route_cache` section SHALL include L1 and L2 hit rates from telemetry
--- a/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/specs/connection-pool-monitoring/spec.md
+++ b/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/specs/connection-pool-monitoring/spec.md
@@ -0,0 +1,27 @@
+## ADDED Requirements
+
+### Requirement: Connection pool status in performance detail
+The performance-detail API SHALL include `db_pool` section with `status` (checked_out, checked_in, overflow, max_capacity, saturation) from `get_pool_status()` and `config` (pool_size, max_overflow, pool_timeout, pool_recycle) from `get_pool_runtime_config()`.
+
+#### Scenario: Pool status retrieved
+- **WHEN** the API is called
+- **THEN** `db_pool.status` SHALL contain current pool utilization metrics and `db_pool.config` SHALL contain the pool configuration values
+
+#### Scenario: Saturation calculation
+- **WHEN** the pool has 8 checked_out connections and max_capacity is 30
+- **THEN** saturation SHALL be reported as approximately 26.7%
+
+### Requirement: Direct Oracle connection counter
+The system SHALL maintain a thread-safe monotonic counter in `database.py` that increments each time `get_db_connection()` or `read_sql_df_slow()` successfully creates a direct (non-pooled) Oracle connection.
+
+#### Scenario: Counter increments on direct connection
+- **WHEN** `get_db_connection()` successfully creates a connection
+- **THEN** the direct connection counter SHALL increment by 1
+
+#### Scenario: Counter in performance detail
+- **WHEN** the performance-detail API is called
+- **THEN** `direct_connections` SHALL contain `total_since_start` (counter value) and `worker_pid` (current process PID)
+
+#### Scenario: Counter is per-worker
+- **WHEN** multiple gunicorn workers are running
+- **THEN** each worker SHALL maintain its own independent counter, and the API SHALL return the counter for the responding worker
--- a/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/specs/metrics-history-trending/spec.md
+++ b/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/specs/metrics-history-trending/spec.md
@@ -0,0 +1,65 @@
+## ADDED Requirements
+
+### Requirement: SQLite metrics history store
+The system SHALL provide a `MetricsHistoryStore` class in `core/metrics_history.py` that persists metrics snapshots to a SQLite database (`logs/metrics_history.sqlite` by default). The store SHALL use thread-local connections and a write lock, following the `LogStore` pattern in `core/log_store.py`.
+
+#### Scenario: Write and query snapshots
+- **WHEN** `write_snapshot(data)` is called with pool/redis/route_cache/latency metrics
+- **THEN** a row SHALL be inserted into `metrics_snapshots` with the current ISO 8601 timestamp and worker PID
+
+#### Scenario: Query by time range
+- **WHEN** `query_snapshots(minutes=30)` is called
+- **THEN** it SHALL return all rows from the last 30 minutes, ordered by timestamp ascending
+
+#### Scenario: Retention cleanup
+- **WHEN** `cleanup()` is called
+- **THEN** rows older than `METRICS_HISTORY_RETENTION_DAYS` (default 3) SHALL be deleted, and total rows SHALL be capped at `METRICS_HISTORY_MAX_ROWS` (default 50000)
+
+#### Scenario: Thread safety
+- **WHEN** multiple threads write snapshots concurrently
+- **THEN** the write lock SHALL serialize writes and prevent database corruption
+
+### Requirement: Background metrics collector
+The system SHALL provide a `MetricsHistoryCollector` class that runs a daemon thread collecting metrics snapshots at a configurable interval (default 30 seconds, via `METRICS_HISTORY_INTERVAL` env var).
+
+#### Scenario: Automatic collection
+- **WHEN** the collector is started via `start_metrics_history(app)`
+- **THEN** it SHALL collect pool status, Redis info, route cache status, and query latency metrics every interval and write them to the store
+
+#### Scenario: Graceful shutdown
+- **WHEN** `stop_metrics_history()` is called
+- **THEN** the collector thread SHALL stop within one interval period
+
+#### Scenario: Subsystem unavailability
+- **WHEN** a subsystem (e.g., Redis) is unavailable during collection
+- **THEN** the collector SHALL write null/0 for those fields and continue collecting other metrics
+
+### Requirement: Performance history API endpoint
+The system SHALL expose `GET /admin/api/performance-history` that returns historical metrics snapshots.
+
+#### Scenario: Query with time range
+- **WHEN** the API is called with `?minutes=30`
+- **THEN** it SHALL return `{"success": true, "data": {"snapshots": [...], "count": N}}`
+
+#### Scenario: Time range bounds
+- **WHEN** `minutes` is less than 1 or greater than 180
+- **THEN** it SHALL be clamped to the range [1, 180]
+
+#### Scenario: Admin authentication
+- **WHEN** the API is called without admin authentication
+- **THEN** it SHALL be rejected by the `@admin_required` decorator
+
+### Requirement: Frontend trend charts
+The system SHALL display 4 trend chart panels in the admin performance dashboard using vue-echarts VChart line/area charts.
+
+#### Scenario: Trend charts with data
+- **WHEN** historical snapshots contain more than 1 data point
+- **THEN** the dashboard SHALL display trend charts for: connection pool saturation, query latency (P50/P95/P99), Redis memory, and cache hit rates
+
+#### Scenario: Trend charts without data
+- **WHEN** historical snapshots are empty or contain only 1 data point
+- **THEN** the trend charts SHALL NOT be displayed (hidden via `v-if`)
+
+#### Scenario: Auto-refresh
+- **WHEN** the dashboard auto-refreshes
+- **THEN** historical data SHALL also be refreshed alongside real-time metrics
--- a/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/tasks.md
+++ b/openspec/changes/archive/2026-02-23-admin-performance-vue-spa/tasks.md
@@ -0,0 +1,80 @@
+## 1. Backend — Cache Telemetry Infrastructure
+
+- [x] 1.1 Add `stats()` method to `ProcessLevelCache` in `core/cache.py` (returns entries/max_size/ttl_seconds with lock)
+- [x] 1.2 Add `_PROCESS_CACHE_REGISTRY`, `register_process_cache()`, and `get_all_process_cache_stats()` to `core/cache.py`
+- [x] 1.3 Register `_wip_df_cache` in `core/cache.py`
+- [x] 1.4 Add `stats()` + `register_process_cache()` to `services/resource_cache.py`
+- [x] 1.5 Add `stats()` + `register_process_cache()` to `services/realtime_equipment_cache.py`
+- [x] 1.6 Add `register_process_cache()` to `services/reject_dataset_cache.py`
+
+## 2. Backend — Direct Connection Counter
+
+- [x] 2.1 Add `_DIRECT_CONN_COUNTER`, `_DIRECT_CONN_LOCK`, and `get_direct_connection_count()` to `core/database.py`
+- [x] 2.2 Increment counter in `get_db_connection()` and `read_sql_df_slow()` after successful connection creation
+
+## 3. Backend — Performance Detail API
+
+- [x] 3.1 Add `GET /admin/api/performance-detail` endpoint in `routes/admin_routes.py` returning redis, process_caches, route_cache, db_pool, and direct_connections sections
+- [x] 3.2 Implement Redis INFO + SCAN namespace key distribution (data, route_cache, equipment_status, reject_dataset, meta, lock, scrap_exclusion) with graceful degradation when Redis is disabled
+
+## 4. Frontend — Page Scaffolding
+
+- [x] 4.1 Create `frontend/src/admin-performance/index.html` and `main.js` (standard Vue SPA entry)
+- [x] 4.2 Register `admin-performance` entry in `vite.config.js`
+- [x] 4.3 Add `cp` command for `admin-performance.html` in `package.json` build script
+
+## 5. Frontend — Reusable Components
+
+- [x] 5.1 Create `GaugeBar.vue` — horizontal gauge bar with label, value, max, and color threshold props
+- [x] 5.2 Create `StatCard.vue` — mini card with numeric value, label, and optional unit/icon
+- [x] 5.3 Create `StatusDot.vue` — colored dot indicator (healthy/degraded/error/disabled) with label
+
+## 6. Frontend — App.vue Main Dashboard
+
+- [x] 6.1 Implement data fetching layer: `loadSystemStatus()`, `loadMetrics()`, `loadPerformanceDetail()`, `loadLogs()`, `loadWorkerStatus()` with `Promise.all` parallel fetch and `useAutoRefresh` (30s)
+- [x] 6.2 Build header section with gradient background, title, auto-refresh toggle, and manual refresh button
+- [x] 6.3 Build status cards section (Database / Redis / Circuit Breaker / Worker PID) using StatusDot
+- [x] 6.4 Build query performance panel with P50/P95/P99 stat cards and ECharts latency distribution chart
+- [x] 6.5 Build Redis cache detail panel with memory GaugeBar, hit rate, client count, peak memory, and namespace key distribution table
+- [x] 6.6 Build memory cache panel with ProcessLevelCache grid cards (entries/max gauge + TTL) and route cache telemetry (L1/L2 hit rate, miss rate, total reads)
+- [x] 6.7 Build connection pool panel with saturation GaugeBar and stat card grid (checked_out, checked_in, overflow, max_capacity, pool_size, pool_recycle, pool_timeout, direct connections)
+- [x] 6.8 Build worker control panel with PID/uptime/cooldown display, restart button, and confirmation modal
+- [x] 6.9 Build system logs panel with level filter, text search, pagination, and log clearing
+- [x] 6.10 Create `style.css` with all panel, grid, gauge, card, and responsive layout styles
+
+## 7. Route Integration
+
+- [x] 7.1 Change `/admin/performance` route handler in `admin_routes.py` from `render_template` to `send_from_directory` serving the Vue SPA
+- [x] 7.2 Update `routeContracts.js`: change renderMode to `'native'`, rollbackStrategy to `'fallback_to_legacy_route'`, compatibilityPolicy to `'redirect_to_shell_when_spa_enabled'`
+
+## 8. Verification (Phase 1)
+
+- [x] 8.1 Run `cd frontend && npx vite build` — confirm no compilation errors and `admin-performance.html` is produced
+- [x] 8.2 Verify all dashboard panels render correctly with live data after service restart
+
+## 9. Backend — Metrics History Store
+
+- [x] 9.1 Create `core/metrics_history.py` with `MetricsHistoryStore` class (SQLite schema, thread-local connections, write_lock, write_snapshot, query_snapshots, cleanup)
+- [x] 9.2 Add `MetricsHistoryCollector` class (daemon thread, configurable interval, collect pool/redis/route_cache/latency)
+- [x] 9.3 Add module-level `get_metrics_history_store()`, `start_metrics_history(app)`, `stop_metrics_history()` functions
+
+## 10. Backend — Lifecycle Integration
+
+- [x] 10.1 Call `start_metrics_history(app)` in `app.py` after other background services
+- [x] 10.2 Call `stop_metrics_history()` in `_shutdown_runtime_resources()` in `app.py`
+
+## 11. Backend — Performance History API
+
+- [x] 11.1 Add `GET /admin/api/performance-history` endpoint in `admin_routes.py` (minutes param, clamped 1-180, returns snapshots array)
+
+## 12. Frontend — Trend Charts
+
+- [x] 12.1 Create `TrendChart.vue` component using vue-echarts VChart (line/area chart, dual yAxis support, time labels, autoresize)
+- [x] 12.2 Add `loadPerformanceHistory()` fetch to `App.vue` and integrate into `refreshAll()`
+- [x] 12.3 Add 4 TrendChart panels to `App.vue` template (pool saturation, query latency, Redis memory, cache hit rates)
+- [x] 12.4 Add trend chart styles to `style.css`
+
+## 13. Verification (Phase 2)
+
+- [x] 13.1 Run `cd frontend && npm run build` — confirm no compilation errors
+- [x] 13.2 Verify trend charts render with historical data after service restart + 60s collection
--- a/openspec/specs/admin-performance-spa/spec.md
+++ b/openspec/specs/admin-performance-spa/spec.md
@@ -0,0 +1,100 @@
+## ADDED Requirements
+
+### Requirement: Vue 3 SPA page replaces Jinja2 template
+The `/admin/performance` route SHALL serve a Vue 3 SPA page built by Vite, replacing the existing Jinja2 server-rendered template. The SPA SHALL be registered as a Vite entry point and integrated into the portal-shell navigation as a `renderMode: 'native'` route.
+
+#### Scenario: Page loads as Vue SPA
+- **WHEN** user navigates to `/admin/performance`
+- **THEN** the server SHALL return the Vite-built `admin-performance.html` static file (not a Jinja2 rendered template)
+
+#### Scenario: Portal-shell integration
+- **WHEN** the portal-shell renders `/admin/performance`
+- **THEN** it SHALL load the page as a native Vue SPA (not an external iframe)
+
+### Requirement: Status cards display system health
+The dashboard SHALL display 4 status cards in a horizontal grid: Database, Redis, Circuit Breaker, and Worker PID. Each card SHALL show a StatusDot indicator (healthy/degraded/error/disabled) with the current status value.
+
+#### Scenario: All systems healthy
+- **WHEN** all backend systems report healthy status via `/admin/api/system-status`
+- **THEN** all 4 status cards SHALL display green StatusDot indicators with their respective values
+
+#### Scenario: Redis disabled
+- **WHEN** Redis is disabled (`REDIS_ENABLED=false`)
+- **THEN** the Redis status card SHALL display a disabled StatusDot indicator and the Redis cache panel SHALL show a graceful degradation message
+
+### Requirement: Query performance panel with ECharts
+The dashboard SHALL display query performance metrics (P50, P95, P99 latencies, total queries, slow queries) and an ECharts latency distribution chart, replacing the existing Chart.js implementation.
+
+#### Scenario: Metrics loaded successfully
+- **WHEN** `/admin/api/metrics` returns valid performance data
+- **THEN** the panel SHALL display P50/P95/P99 latency values and render an ECharts bar chart showing latency distribution
+
+#### Scenario: No metrics data
+- **WHEN** `/admin/api/metrics` returns empty or null metrics
+- **THEN** the panel SHALL display placeholder text indicating no data available
+
+### Requirement: Redis cache detail panel
+The dashboard SHALL display a Redis cache detail panel showing memory usage (as a GaugeBar), connected clients, hit rate percentage, peak memory, and a namespace key distribution table.
+
+#### Scenario: Redis active with data
+- **WHEN** `/admin/api/performance-detail` returns Redis data with namespace key counts
+- **THEN** the panel SHALL display a memory GaugeBar, hit rate, client count, and a table listing each namespace with its key count
+
+#### Scenario: Redis disabled
+- **WHEN** Redis is disabled
+- **THEN** the Redis detail panel SHALL display a disabled state message without errors
+
+### Requirement: Memory cache panel
+The dashboard SHALL display ProcessLevelCache statistics as grid cards (showing entries/max_size as a mini gauge and TTL) plus Route Cache telemetry (L1 hit rate, L2 hit rate, miss rate, total reads).
+
+#### Scenario: Multiple caches registered
+- **WHEN** `/admin/api/performance-detail` returns process_caches with multiple entries
+- **THEN** the panel SHALL render one card per cache instance showing entries, max_size, TTL, and description
+
+#### Scenario: Route cache telemetry
+- **WHEN** `/admin/api/performance-detail` returns route_cache data
+- **THEN** the panel SHALL display L1 hit rate, L2 hit rate, miss rate, and total reads
+
+### Requirement: Connection pool panel
+The dashboard SHALL display connection pool saturation as a GaugeBar and stat cards showing checked_out, checked_in, overflow, max_capacity, pool_size, pool_recycle, pool_timeout, and direct connection count.
+
+#### Scenario: Pool under normal load
+- **WHEN** pool saturation is below 80%
+- **THEN** the GaugeBar SHALL display in a normal color (green/blue)
+
+#### Scenario: Pool near saturation
+- **WHEN** pool saturation exceeds 80%
+- **THEN** the GaugeBar SHALL display in a warning color (yellow/orange/red)
+
+### Requirement: Worker control panel
+The dashboard SHALL display worker PID, uptime, cooldown status, and provide a restart button with a confirmation modal.
+
+#### Scenario: Restart worker
+- **WHEN** user clicks the restart button and confirms in the modal
+- **THEN** the system SHALL POST to `/admin/api/worker/restart` and display the result
+
+#### Scenario: Restart during cooldown
+- **WHEN** worker is in cooldown period
+- **THEN** the restart button SHALL be disabled with a cooldown indicator
+
+### Requirement: System logs panel with filtering and pagination
+The dashboard SHALL display system logs with level filtering, text search, and pagination controls.
+
+#### Scenario: Filter by log level
+- **WHEN** user selects a specific log level filter
+- **THEN** only logs matching that level SHALL be displayed
+
+#### Scenario: Paginate logs
+- **WHEN** logs exceed the page size
+- **THEN** pagination controls SHALL allow navigating between pages
+
+### Requirement: Auto-refresh with toggle
+The dashboard SHALL auto-refresh all panels every 30 seconds using `useAutoRefresh`. The user SHALL be able to toggle auto-refresh on/off and manually trigger a refresh.
+
+#### Scenario: Auto-refresh enabled
+- **WHEN** auto-refresh is enabled (default)
+- **THEN** all panels SHALL refresh their data every 30 seconds via `Promise.all` parallel fetch
+
+#### Scenario: Manual refresh
+- **WHEN** user clicks the manual refresh button
+- **THEN** all panels SHALL immediately refresh their data
--- a/openspec/specs/cache-telemetry-api/spec.md
+++ b/openspec/specs/cache-telemetry-api/spec.md
@@ -0,0 +1,56 @@
+## ADDED Requirements
+
+### Requirement: ProcessLevelCache stats method
+Every `ProcessLevelCache` instance SHALL expose a `stats()` method that returns a dict containing `entries` (live entries count), `max_size`, and `ttl_seconds`.
+
+#### Scenario: Stats on active cache
+- **WHEN** `stats()` is called on a ProcessLevelCache with 5 live entries (max_size=32, ttl=30s)
+- **THEN** it SHALL return `{"entries": 5, "max_size": 32, "ttl_seconds": 30}`
+
+#### Scenario: Stats with expired entries
+- **WHEN** `stats()` is called and some entries have exceeded TTL
+- **THEN** `entries` SHALL only count entries where `now - timestamp <= ttl`
+
+#### Scenario: Thread safety
+- **WHEN** `stats()` is called concurrently with cache writes
+- **THEN** it SHALL acquire the cache lock and return consistent data without races
+
+### Requirement: ProcessLevelCache global registry
+The system SHALL maintain a module-level registry in `core/cache.py` that maps cache names to `(description, instance)` tuples. Services SHALL register their cache instances at module load time via `register_process_cache(name, instance, description)`.
+
+#### Scenario: Register and retrieve all caches
+- **WHEN** multiple services register their caches and `get_all_process_cache_stats()` is called
+- **THEN** it SHALL return a dict of `{name: {entries, max_size, ttl_seconds, description}}` for all registered caches
+
+#### Scenario: Cache not registered
+- **WHEN** a service's ProcessLevelCache is not registered
+- **THEN** it SHALL NOT appear in `get_all_process_cache_stats()` output
+
+### Requirement: Performance detail API endpoint
+The system SHALL expose `GET /admin/api/performance-detail` that returns a JSON object with sections: `redis`, `process_caches`, `route_cache`, `db_pool`, and `direct_connections`.
+
+#### Scenario: All systems available
+- **WHEN** the API is called and all subsystems are healthy
+- **THEN** it SHALL return all 5 sections with current telemetry data
+
+#### Scenario: Redis disabled
+- **WHEN** Redis is disabled (`REDIS_ENABLED=false`)
+- **THEN** the `redis` section SHALL be `null` or contain `{"enabled": false}`, and other sections SHALL still return normally
+
+### Requirement: Redis namespace key distribution
+The performance-detail API SHALL scan Redis keys by namespace prefix and return key counts per namespace. Namespaces SHALL include: `data`, `route_cache`, `equipment_status`, `reject_dataset`, `meta`, `lock`, `scrap_exclusion`.
+
+#### Scenario: Keys exist across namespaces
+- **WHEN** Redis contains keys across multiple namespaces
+- **THEN** the `redis.namespaces` array SHALL list each namespace with its `name` and `key_count`
+
+#### Scenario: SCAN safety
+- **WHEN** scanning Redis keys
+- **THEN** the system SHALL use `SCAN` (not `KEYS`) to avoid blocking Redis
+
+### Requirement: Route cache telemetry in performance detail
+The performance-detail API SHALL include route cache telemetry from `get_route_cache_status()`, providing `mode`, `l1_size`, `l1_hit_rate`, `l2_hit_rate`, `miss_rate`, and `reads_total`.
+
+#### Scenario: LayeredCache active
+- **WHEN** route cache is in layered mode
+- **THEN** the `route_cache` section SHALL include L1 and L2 hit rates from telemetry
--- a/openspec/specs/connection-pool-monitoring/spec.md
+++ b/openspec/specs/connection-pool-monitoring/spec.md
@@ -0,0 +1,27 @@
+## ADDED Requirements
+
+### Requirement: Connection pool status in performance detail
+The performance-detail API SHALL include `db_pool` section with `status` (checked_out, checked_in, overflow, max_capacity, saturation) from `get_pool_status()` and `config` (pool_size, max_overflow, pool_timeout, pool_recycle) from `get_pool_runtime_config()`.
+
+#### Scenario: Pool status retrieved
+- **WHEN** the API is called
+- **THEN** `db_pool.status` SHALL contain current pool utilization metrics and `db_pool.config` SHALL contain the pool configuration values
+
+#### Scenario: Saturation calculation
+- **WHEN** the pool has 8 checked_out connections and max_capacity is 30
+- **THEN** saturation SHALL be reported as approximately 26.7%
+
+### Requirement: Direct Oracle connection counter
+The system SHALL maintain a thread-safe monotonic counter in `database.py` that increments each time `get_db_connection()` or `read_sql_df_slow()` successfully creates a direct (non-pooled) Oracle connection.
+
+#### Scenario: Counter increments on direct connection
+- **WHEN** `get_db_connection()` successfully creates a connection
+- **THEN** the direct connection counter SHALL increment by 1
+
+#### Scenario: Counter in performance detail
+- **WHEN** the performance-detail API is called
+- **THEN** `direct_connections` SHALL contain `total_since_start` (counter value) and `worker_pid` (current process PID)
+
+#### Scenario: Counter is per-worker
+- **WHEN** multiple gunicorn workers are running
+- **THEN** each worker SHALL maintain its own independent counter, and the API SHALL return the counter for the responding worker
--- a/openspec/specs/metrics-history-trending/spec.md
+++ b/openspec/specs/metrics-history-trending/spec.md
@@ -0,0 +1,65 @@
+## ADDED Requirements
+
+### Requirement: SQLite metrics history store
+The system SHALL provide a `MetricsHistoryStore` class in `core/metrics_history.py` that persists metrics snapshots to a SQLite database (`logs/metrics_history.sqlite` by default). The store SHALL use thread-local connections and a write lock, following the `LogStore` pattern in `core/log_store.py`.
+
+#### Scenario: Write and query snapshots
+- **WHEN** `write_snapshot(data)` is called with pool/redis/route_cache/latency metrics
+- **THEN** a row SHALL be inserted into `metrics_snapshots` with the current ISO 8601 timestamp and worker PID
+
+#### Scenario: Query by time range
+- **WHEN** `query_snapshots(minutes=30)` is called
+- **THEN** it SHALL return all rows from the last 30 minutes, ordered by timestamp ascending
+
+#### Scenario: Retention cleanup
+- **WHEN** `cleanup()` is called
+- **THEN** rows older than `METRICS_HISTORY_RETENTION_DAYS` (default 3) SHALL be deleted, and total rows SHALL be capped at `METRICS_HISTORY_MAX_ROWS` (default 50000)
+
+#### Scenario: Thread safety
+- **WHEN** multiple threads write snapshots concurrently
+- **THEN** the write lock SHALL serialize writes and prevent database corruption
+
+### Requirement: Background metrics collector
+The system SHALL provide a `MetricsHistoryCollector` class that runs a daemon thread collecting metrics snapshots at a configurable interval (default 30 seconds, via `METRICS_HISTORY_INTERVAL` env var).
+
+#### Scenario: Automatic collection
+- **WHEN** the collector is started via `start_metrics_history(app)`
+- **THEN** it SHALL collect pool status, Redis info, route cache status, and query latency metrics every interval and write them to the store
+
+#### Scenario: Graceful shutdown
+- **WHEN** `stop_metrics_history()` is called
+- **THEN** the collector thread SHALL stop within one interval period
+
+#### Scenario: Subsystem unavailability
+- **WHEN** a subsystem (e.g., Redis) is unavailable during collection
+- **THEN** the collector SHALL write null/0 for those fields and continue collecting other metrics
+
+### Requirement: Performance history API endpoint
+The system SHALL expose `GET /admin/api/performance-history` that returns historical metrics snapshots.
+
+#### Scenario: Query with time range
+- **WHEN** the API is called with `?minutes=30`
+- **THEN** it SHALL return `{"success": true, "data": {"snapshots": [...], "count": N}}`
+
+#### Scenario: Time range bounds
+- **WHEN** `minutes` is less than 1 or greater than 180
+- **THEN** it SHALL be clamped to the range [1, 180]
+
+#### Scenario: Admin authentication
+- **WHEN** the API is called without admin authentication
+- **THEN** it SHALL be rejected by the `@admin_required` decorator
+
+### Requirement: Frontend trend charts
+The system SHALL display 4 trend chart panels in the admin performance dashboard using vue-echarts VChart line/area charts.
+
+#### Scenario: Trend charts with data
+- **WHEN** historical snapshots contain more than 1 data point
+- **THEN** the dashboard SHALL display trend charts for: connection pool saturation, query latency (P50/P95/P99), Redis memory, and cache hit rates
+
+#### Scenario: Trend charts without data
+- **WHEN** historical snapshots are empty or contain only 1 data point
+- **THEN** the trend charts SHALL NOT be displayed (hidden via `v-if`)
+
+#### Scenario: Auto-refresh
+- **WHEN** the dashboard auto-refreshes
+- **THEN** historical data SHALL also be refreshed alongside real-time metrics