chore: finalize vite migration hardening and archive openspec changes

2026-02-08 20:03:36 +08:00
parent b56e80381b
commit c8e225101e
119 changed files with 6547 additions and 1301 deletions
--- a/openspec/specs/api-safety-hygiene/spec.md
+++ b/openspec/specs/api-safety-hygiene/spec.md
@@ -0,0 +1,33 @@
+# api-safety-hygiene Specification
+
+## Purpose
+TBD - created by archiving change residual-hardening-round3. Update Purpose after archive.
+## Requirements
+### Requirement: Recursive Payload Cleaning MUST Enforce Depth Safety
+Routes that normalize nested payloads MUST prevent unbounded recursion depth.
+
+#### Scenario: Deeply nested response object
+- **WHEN** NaN-cleaning helper receives deeply nested list/dict payload
+- **THEN** cleaning logic MUST enforce max depth or iterative traversal and return safely without recursion failure
+
+### Requirement: Filter Source Names MUST Be Configurable
+Filter cache query sources MUST NOT rely on hardcoded view names only.
+
+#### Scenario: Environment-specific view names
+- **WHEN** deployment sets custom filter-source environment variables
+- **THEN** filter cache loader MUST resolve and query configured view names
+
+### Requirement: High-Cost APIs SHALL Apply Basic Rate Guardrails
+High-cost read endpoints SHALL apply configurable request-rate guardrails to reduce abuse and accidental bursts.
+
+#### Scenario: Burst traffic from same client
+- **WHEN** a client exceeds configured request budget for guarded endpoints
+- **THEN** endpoint SHALL return throttled response with clear retry guidance
+
+### Requirement: Common Boolean Query Parsing SHALL Be Shared
+Boolean query parsing in routes SHALL use shared helper behavior.
+
+#### Scenario: Different routes parse include flags
+- **WHEN** routes parse common boolean query parameters
+- **THEN** parsing behavior MUST be consistent across routes via shared utility
+
--- a/openspec/specs/cache-indexed-query-acceleration/spec.md
+++ b/openspec/specs/cache-indexed-query-acceleration/spec.md
@@ -0,0 +1,26 @@
+# cache-indexed-query-acceleration Specification
+
+## Purpose
+TBD - created by archiving change p1-cache-query-efficiency. Update Purpose after archive.
+## Requirements
+### Requirement: Incremental Synchronization SHALL Use Versioned Watermarks
+For heavy non-full-snapshot datasets, cache refresh SHALL support incremental synchronization keyed by stable version or watermark boundaries.
+
+#### Scenario: Incremental refresh cycle
+- **WHEN** source data version indicates partial changes since last sync
+- **THEN** cache update logic MUST fetch and merge only changed partitions while preserving correctness guarantees
+
+### Requirement: Query Paths SHALL Use Indexed Access for High-Frequency Filters
+Query execution over cached data SHALL use prebuilt indexes for known high-frequency filter columns.
+
+#### Scenario: Filtered report query
+- **WHEN** request filters target indexed fields
+- **THEN** result selection MUST avoid full dataset scans and maintain existing response contract
+
+### Requirement: Business-Mandated Full-Table Caches SHALL Be Preserved for Resource and WIP
+The system SHALL continue to maintain full-table cache behavior for `resource` and `wip` domains.
+
+#### Scenario: Resource or WIP cache refresh
+- **WHEN** cache update runs for `resource` or `wip`
+- **THEN** the updater MUST retain full-table snapshot semantics and MUST NOT switch these domains to partial-only cache mode
+
--- a/openspec/specs/cache-observability-hardening/spec.md
+++ b/openspec/specs/cache-observability-hardening/spec.md
@@ -36,3 +36,53 @@ The system MUST define alert thresholds for sustained degraded state, repeated w
 - **WHEN** degraded status persists beyond configured duration
 - **THEN** the monitoring contract MUST classify the service as alert-worthy with actionable context

+### Requirement: Cache Telemetry SHALL Include Memory Amplification Signals
+Operational telemetry MUST expose cache-domain memory usage indicators and representation amplification factors, and MUST differentiate between authoritative data payload and derived/index helper structures.
+
+#### Scenario: Deep health telemetry request after representation normalization
+- **WHEN** operators inspect cache telemetry for resource or WIP domains
+- **THEN** telemetry MUST include per-domain memory footprint, amplification indicators, and enough structure detail to verify that full-record duplication is not reintroduced
+
+### Requirement: Efficiency Benchmarks SHALL Gate Cache Refactor Rollout
+Cache/query efficiency changes MUST be validated against baseline latency and memory benchmarks before rollout.
+
+#### Scenario: Pre-release validation
+- **WHEN** cache refactor changes are prepared for deployment
+- **THEN** benchmark results MUST demonstrate no regression beyond configured thresholds for P95 latency and memory usage
+
+### Requirement: Process-Level Cache SHALL Use Bounded Capacity with Deterministic Eviction
+Process-level parsed-data caches MUST enforce a configurable maximum key capacity and use deterministic eviction behavior when capacity is exceeded.
+
+#### Scenario: Cache capacity reached
+- **WHEN** a new cache entry is inserted and key capacity is at limit
+- **THEN** cache MUST evict entries according to defined policy before storing the new key
+
+#### Scenario: Repeated access updates recency
+- **WHEN** an existing cache key is read or overwritten
+- **THEN** eviction order MUST reflect recency semantics so hot keys are retained preferentially
+
+### Requirement: Cache Publish MUST Preserve Previous Readable Snapshot on Failure
+When refreshing full-table cache payloads, the system MUST avoid exposing partially published states to readers.
+
+#### Scenario: Publish fails after payload serialization
+- **WHEN** a cache refresh has prepared new payload but publish operation fails
+- **THEN** previously published cache keys MUST remain readable and metadata MUST remain consistent with old snapshot
+
+#### Scenario: Publish succeeds
+- **WHEN** publish operation completes successfully
+- **THEN** data payload and metadata keys MUST be visible as one coherent new snapshot
+
+### Requirement: Process-Level Cache Slow Path SHALL Minimize Lock Hold Time
+Large payload parsing MUST NOT happen inside long-held process cache locks.
+
+#### Scenario: Cache miss under concurrent requests
+- **WHEN** multiple requests hit process cache miss
+- **THEN** parsing work SHALL happen outside lock-protected mutation section, and lock scope SHALL be limited to consistency check + commit
+
+### Requirement: Process-Level Cache Policies MUST Stay Consistent Across Services
+All service-local process caches MUST support bounded capacity with deterministic eviction.
+
+#### Scenario: Realtime equipment cache growth
+- **WHEN** realtime equipment process cache reaches configured capacity
+- **THEN** entries MUST be evicted according to deterministic LRU behavior
+
--- a/openspec/specs/conda-systemd-runtime-alignment/spec.md
+++ b/openspec/specs/conda-systemd-runtime-alignment/spec.md
@@ -24,3 +24,17 @@ Runbooks and deployment documentation MUST describe the same conda/systemd/watch
 - **WHEN** an operator performs deploy, health check, and rollback from documentation
 - **THEN** documented commands and paths MUST work without requiring venv-specific assumptions

+### Requirement: Runtime Path Drift SHALL Be Detectable Before Service Start
+Service startup checks MUST validate configured conda runtime paths across app, watchdog, and worker control scripts.
+
+#### Scenario: Conda path mismatch detected
+- **WHEN** startup validation finds runtime path inconsistency between configured units and scripts
+- **THEN** service start MUST fail with actionable diagnostics instead of running with partial mismatch
+
+### Requirement: Conda/Systemd Contract SHALL Be Versioned in Operations Docs
+The documented runtime contract MUST include versioned path assumptions and verification commands.
+
+#### Scenario: Operator verifies deployment contract
+- **WHEN** operator follows runbook validation steps
+- **THEN** commands MUST confirm active runtime paths match documented conda/systemd contract
+
--- a/openspec/specs/frontend-compute-shift/spec.md
+++ b/openspec/specs/frontend-compute-shift/spec.md
@@ -50,3 +50,17 @@ Frontend matrix/filter computations SHALL produce deterministic selection and fi
 - **WHEN** users toggle matrix cells across group, family, and resource rows
 - **THEN** selected-state rendering and filtered equipment result sets MUST remain level-correct and reversible

+### Requirement: Reusable Browser Compute Modules SHALL Power Report Derivations
+Derived computations for report filters, KPI cards, chart series, and table projections SHALL be implemented through reusable frontend modules.
+
+#### Scenario: Shared report derivation logic
+- **WHEN** multiple report pages require equivalent data-shaping behavior
+- **THEN** pages MUST consume shared compute modules instead of duplicating transformation logic per page
+
+### Requirement: Browser Compute Shift SHALL Preserve Export and Field Contracts
+Moving computations to frontend MUST preserve existing field naming and export column contracts.
+
+#### Scenario: User exports report after frontend-side derivation
+- **WHEN** transformed data is rendered and exported
+- **THEN** exported field names and ordering MUST remain consistent with governed field contract definitions
+
--- a/openspec/specs/maintainability-type-and-constant-hygiene/spec.md
+++ b/openspec/specs/maintainability-type-and-constant-hygiene/spec.md
@@ -0,0 +1,19 @@
+# maintainability-type-and-constant-hygiene Specification
+
+## Purpose
+TBD - created by archiving change residual-hardening-round4. Update Purpose after archive.
+## Requirements
+### Requirement: Core Cache and Service Boundaries MUST Use Consistent Type Annotation Style
+Core cache/service modules touched by this change SHALL use a consistent and explicit type-annotation style for public and internal helper boundaries.
+
+#### Scenario: Reviewing updated cache/service modules
+- **WHEN** maintainers inspect function signatures in affected modules
+- **THEN** optional and collection types MUST follow a single consistent style and remain compatible with the project Python baseline
+
+### Requirement: High-Frequency Magic Numbers MUST Be Replaced by Named Constants
+Cache, throttling, and index-related numeric literals that control behavior MUST be extracted to named constants or env-configurable settings.
+
+#### Scenario: Tuning cache/index behavior
+- **WHEN** operators need to tune cache/index thresholds
+- **THEN** they MUST find values in named constants or environment variables rather than scattered inline literals
+
--- a/openspec/specs/oracle-query-fragment-governance/spec.md
+++ b/openspec/specs/oracle-query-fragment-governance/spec.md
@@ -0,0 +1,19 @@
+# oracle-query-fragment-governance Specification
+
+## Purpose
+TBD - created by archiving change residual-hardening-round4. Update Purpose after archive.
+## Requirements
+### Requirement: Shared Oracle Query Fragments SHALL Have a Single Source of Truth
+Cross-service Oracle query fragments for resource and equipment cache loading MUST be defined in a shared module and imported by service implementations.
+
+#### Scenario: Update common table/view reference
+- **WHEN** a common table or view name changes
+- **THEN** operators and developers MUST be able to update one shared definition without editing duplicated SQL literals across services
+
+### Requirement: Service Queries MUST Preserve Existing Columns and Semantics
+Services consuming shared Oracle query fragments SHALL preserve existing selected columns, filters, and downstream payload behavior.
+
+#### Scenario: Resource and equipment cache refresh after refactor
+- **WHEN** cache services execute queries via shared fragments
+- **THEN** resulting payload structure MUST remain compatible with existing aggregation and API contracts
+
--- a/openspec/specs/resource-cache-representation-normalization/spec.md
+++ b/openspec/specs/resource-cache-representation-normalization/spec.md
@@ -0,0 +1,26 @@
+# resource-cache-representation-normalization Specification
+
+## Purpose
+TBD - created by archiving change residual-hardening-round4. Update Purpose after archive.
+## Requirements
+### Requirement: Resource Derived Index MUST Avoid Full Record Duplication
+Resource derived index SHALL use lightweight row-position references instead of storing full duplicated record payloads alongside the process DataFrame cache.
+
+#### Scenario: Build index from cached DataFrame
+- **WHEN** resource cache data is parsed from Redis into process-level DataFrame
+- **THEN** the derived index MUST store position-based references and metadata without a second full records copy
+
+### Requirement: Resource Query APIs SHALL Preserve Existing Response Contract
+Resource query APIs MUST keep existing output fields and semantics after index representation normalization.
+
+#### Scenario: Read all resources after normalization
+- **WHEN** callers request all resources or filtered resource lists
+- **THEN** the returned payload MUST remain field-compatible with pre-normalization responses
+
+### Requirement: Cache Invalidation MUST Keep Index/Data Coherent
+The system SHALL invalidate and rebuild DataFrame/index representations atomically at cache refresh boundaries.
+
+#### Scenario: Redis-backed cache refresh completes
+- **WHEN** a new resource cache snapshot is published
+- **THEN** stale index references MUST be invalidated before subsequent reads use refreshed DataFrame data
+
--- a/openspec/specs/runtime-resilience-recovery/spec.md
+++ b/openspec/specs/runtime-resilience-recovery/spec.md
@@ -48,3 +48,47 @@ The system MUST expose machine-readable resilience thresholds, restart-churn ind
 #### Scenario: Admin status includes restart churn summary
 - **WHEN** operators call `/admin/api/system-status` or `/admin/api/worker/status`
 - **THEN** responses MUST include bounded restart history summary within a configured time window and indicate whether churn threshold is exceeded
+
+### Requirement: Recovery Recommendations SHALL Reflect Self-Healing Policy State
+Health and admin resilience payloads MUST expose whether automated recovery is allowed, cooling down, or blocked by churn policy.
+
+#### Scenario: Operator inspects degraded state
+- **WHEN** `/health` or `/admin/api/worker/status` is requested during degradation
+- **THEN** response MUST include policy state, cooldown remaining time, and next recommended action
+
+### Requirement: Manual Recovery Override SHALL Be Explicit and Controlled
+Manual restart actions MUST bypass automatic block only through authenticated operator pathways with explicit acknowledgement.
+
+#### Scenario: Churn-blocked state with manual override request
+- **WHEN** authorized admin requests manual restart while auto-recovery is blocked
+- **THEN** system MUST execute controlled restart path and log the override context for auditability
+
+### Requirement: Circuit Breaker State Transitions SHALL Avoid Lock-Held Logging
+Circuit breaker state transitions MUST avoid executing logger I/O while internal state locks are held.
+
+#### Scenario: State transition occurs
+- **WHEN** circuit breaker transitions between CLOSED, OPEN, or HALF_OPEN
+- **THEN** lock-protected section MUST complete state mutation before emitting transition log output
+
+#### Scenario: Slow log handler under load
+- **WHEN** logger handlers are slow or blocked
+- **THEN** circuit breaker lock contention MUST remain bounded and MUST NOT serialize unrelated request paths behind logging latency
+
+### Requirement: Health Endpoints SHALL Use Short Internal Memoization
+Health and deep-health computation SHALL use a short-lived internal cache to prevent probe storms from amplifying backend load.
+
+#### Scenario: Frequent monitor scrapes
+- **WHEN** health endpoints are called repeatedly within a small window
+- **THEN** service SHALL return memoized payload for up to 5 seconds in non-testing environments
+
+#### Scenario: Testing mode
+- **WHEN** app is running in testing mode
+- **THEN** health endpoint memoization MUST be bypassed to preserve deterministic tests
+
+### Requirement: Logs MUST Redact Connection Secrets
+Runtime logs MUST avoid exposing DB connection credentials.
+
+#### Scenario: Connection string appears in log message
+- **WHEN** a log message contains DB URL credentials
+- **THEN** logger output MUST redact password and sensitive userinfo before emission
+
--- a/openspec/specs/security-surface-hardening/spec.md
+++ b/openspec/specs/security-surface-hardening/spec.md
@@ -0,0 +1,38 @@
+# security-surface-hardening Specification
+
+## Purpose
+TBD - created by archiving change security-stability-hardening-round2. Update Purpose after archive.
+## Requirements
+### Requirement: LDAP Authentication Endpoint Configuration SHALL Be Strictly Validated
+The system MUST validate LDAP authentication endpoint configuration before use, including HTTPS scheme enforcement and host allowlist checks.
+
+#### Scenario: Invalid LDAP URL configuration detected
+- **WHEN** `LDAP_API_URL` is missing, non-HTTPS, or points to a host outside the configured allowlist
+- **THEN** the service MUST reject LDAP authentication calls and emit actionable diagnostics without sending credentials to that endpoint
+
+#### Scenario: Valid LDAP URL configuration accepted
+- **WHEN** `LDAP_API_URL` uses HTTPS and host is allowlisted
+- **THEN** LDAP authentication requests MAY proceed with normal timeout and error handling behavior
+
+### Requirement: Security Response Headers SHALL Be Applied Globally
+All HTTP responses MUST include baseline security headers suitable for dashboard and API traffic.
+
+#### Scenario: Standard response emitted
+- **WHEN** any route returns a response
+- **THEN** response MUST include `Content-Security-Policy`, `X-Frame-Options`, `X-Content-Type-Options`, and `Referrer-Policy`
+
+#### Scenario: Production transport hardening
+- **WHEN** runtime environment is production
+- **THEN** response MUST include `Strict-Transport-Security`
+
+### Requirement: Pagination Input Boundaries SHALL Be Enforced
+Endpoints accepting pagination parameters MUST enforce lower and upper bounds before query execution.
+
+#### Scenario: Negative or zero pagination inputs
+- **WHEN** client sends `page <= 0` or `page_size <= 0`
+- **THEN** server MUST normalize values to minimum supported bounds
+
+#### Scenario: Excessive page size requested
+- **WHEN** client sends `page_size` above configured maximum
+- **THEN** server MUST clamp to maximum supported page size
+
--- a/openspec/specs/worker-self-healing-governance/spec.md
+++ b/openspec/specs/worker-self-healing-governance/spec.md
@@ -0,0 +1,26 @@
+# worker-self-healing-governance Specification
+
+## Purpose
+TBD - created by archiving change p2-ops-self-healing-runbook. Update Purpose after archive.
+## Requirements
+### Requirement: Automated Worker Recovery SHALL Use Bounded Policy Guards
+Automated worker restart behavior MUST enforce cooldown periods and bounded restart attempts within a configurable time window.
+
+#### Scenario: Repeated worker degradation within short window
+- **WHEN** degradation events exceed configured restart-attempt budget
+- **THEN** automated restarts MUST pause and surface a blocked-recovery signal for operator intervention
+
+### Requirement: Restart-Churn Protection SHALL Prevent Recovery Storms
+The runtime MUST classify restart churn and prevent uncontrolled restart loops.
+
+#### Scenario: Churn threshold exceeded
+- **WHEN** restart count crosses churn threshold in active window
+- **THEN** watchdog MUST enter guarded mode and require explicit manual override before further restart attempts
+
+### Requirement: Recovery Decisions SHALL Be Audit-Ready
+Every auto-recovery decision and manual override action MUST be recorded with structured metadata.
+
+#### Scenario: Worker restart decision emitted
+- **WHEN** system executes or denies a restart action
+- **THEN** structured logs/events MUST include reason, thresholds, actor/source, and resulting state
+