Implement phased modernization infrastructure for transitioning from multi-page legacy routing to SPA portal-shell architecture, plus post-delivery hardening fixes for policy loading, fallback consistency, and governance drift detection. Key changes: - Add route contract enrichment with scope/visibility/compatibility policies - Canonical 302 redirects from legacy direct-entry to /portal-shell/ routes - Asset readiness enforcement and runtime fallback retirement for in-scope routes - Shared feature-flag helpers (env > config > default) replacing duplicated _to_bool - Defensive copy for lru_cached policy payloads preventing mutation corruption - Unified retired-fallback response helper across app and blueprint routes - Frontend/backend route-contract cross-validation in governance gates - Shell CSS token fallback values for routes rendered outside shell scope - Local-safe .env.example defaults with production recommendation comments - Legacy contract fallback warning logging and single-hop redirect optimization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6.8 KiB
6.8 KiB
Context
Phase 1 modernization was completed and archived, but post-delivery review identified several hardening gaps across policy loading, fallback behavior consistency, feature-flag ergonomics, and route-governance drift detection. The gaps are cross-cutting: backend route hosts, frontend contract metadata, CI governance checks, and operator onboarding defaults.
Current risks include:
- shared mutable cached policy payloads via
lru_cachereturn values, - inconsistent retired-fallback 503 response surfaces between app routes and blueprint routes,
- local bootstrap failures caused by strict
.env.exampledefaults, - duplicated boolean feature-flag parsing with slightly different precedence logic,
- missing frontend/backend route-inventory cross-validation.
Goals / Non-Goals
Goals:
- Make modernization policy loading deterministic, mutation-safe, and explicitly documented.
- Standardize retired-fallback error response behavior across all in-scope route hosts.
- Keep
.env.examplelocal-safe while documenting production hardening expectations. - Centralize feature-flag resolution semantics in shared helpers.
- Enforce route-contract parity across backend artifacts and frontend shell contracts.
- Improve operational observability for legacy contract fallback loading.
Non-Goals:
- No new route migrations in this change.
- No redesign of shell navigation UX.
- No deferred-route modernization implementation (
/tables,/excel-query,/query-tool,/mid-section-defect). - No runtime hot-reload framework for all policy artifacts.
Decisions
Decision 1: Introduce shared feature-flag helpers for bool parsing and env/config precedence
- Choice: Add shared helpers (for example
parse_bool,resolve_bool_flag) used by app/policy/runtime modules. - Rationale: eliminates duplicated parsing and precedence drift.
- Alternative considered: keep local
_to_booland alignment-by-convention. - Why not alternative: high regression risk from future divergence and incomplete audits.
Decision 2: Protect cached policy payloads from cross-caller mutation
- Choice: keep internal cache, but return defensive copies to callers; document refresh semantics and expose explicit cache-clear helper for tests/controlled refresh points.
- Rationale: avoids shared-reference corruption without changing call sites.
- Alternative considered: return
MappingProxyType. - Why not alternative: nested list/dict payloads still remain mutable unless deeply frozen.
Decision 3: Unify retired-fallback response generation
- Choice: move to a shared fallback-retirement response helper callable from both app-level and blueprint-level routes.
- Rationale: consistent status/template/body contract and easier testing.
- Alternative considered: leave blueprint-specific inline HTML responses.
- Why not alternative: inconsistent user/operator behavior and duplicated logic.
Decision 4: Rebalance .env.example for safe local onboarding
- Choice: set strict modernization toggles to local-safe defaults and annotate production-recommended values inline.
- Rationale: avoid false-negative startup failures in local/test environments while preserving explicit production guidance.
- Alternative considered: keep strict defaults and require all local users to override manually.
- Why not alternative: unnecessary onboarding friction and frequent bootstrap failures.
Decision 5: Add governance parity checks across frontend and backend route contracts
- Choice: extend governance checks/tests to compare backend route contract artifacts with frontend route inventory/scope metadata.
- Rationale: catches silent drift before release.
- Alternative considered: rely only on backend JSON consistency.
- Why not alternative: frontend contract drift can still break runtime behavior silently.
Decision 6: Emit explicit warning when legacy contract source is used
- Choice: log warning when loader falls back from primary contract artifact to legacy path.
- Rationale: improves observability during migration tail.
- Alternative considered: silent fallback.
- Why not alternative: hard to detect stale-source dependency in production.
Decision 7: Reduce unnecessary redirect hops in /hold-detail missing-reason flow
- Choice: when SPA shell mode is enabled, redirect directly to canonical shell overview path.
- Rationale: reduces redirect chain complexity and improves deterministic route tracing.
- Alternative considered: keep current two-hop behavior.
- Why not alternative: no benefit, adds trace/debug noise.
Decision 8: Add token fallbacks for shell-dependent route styles
- Choice: where route-local CSS consumes shell variables, include fallback values in
var(--token, fallback)form. - Rationale: prevents degraded rendering when route is rendered outside shell variable scope.
- Alternative considered: assume shell-only render path.
- Why not alternative: fallback/compatibility entry paths still exist in this phase.
Risks / Trade-offs
- [Risk] Shared helper refactor may alter existing truthy/falsey behavior in edge env values.
- Mitigation: add unit tests covering canonical and malformed env values before replacing call sites.
- [Risk] Contract parity gate can fail current CI if artifacts are already drifted.
- Mitigation: land parity test with synchronized artifacts in same change.
- [Risk] Defensive-copy strategy adds minor per-call overhead.
- Mitigation: policy payloads are small and low-frequency; prioritize correctness over micro-optimization.
- [Risk]
.env.exampledefault changes may be interpreted as weaker production stance.- Mitigation: add explicit production recommendation comments next to each local-safe default.
Migration Plan
- Add shared feature-flag helpers and migrate existing bool parsing call sites.
- Refactor modernization policy cache-return behavior to mutation-safe contract and document refresh semantics.
- Introduce shared retired-fallback response helper and migrate hold-overview/hold-history/hold-detail route handlers.
- Update
.env.exampledefaults and production guidance comments. - Extend governance script/tests for frontend/backend route-contract parity.
- Add warning log on legacy contract-source fallback.
- Update
/hold-detailmissing-reason redirect to single-hop canonical target under SPA mode. - Add fallback values for QC-GATE shell-derived CSS variables.
- Run targeted unit/integration/e2e + governance checks.
Rollback strategy:
- Changes are config/code-level and can be reverted by standard git rollback.
- If parity gate causes unexpected release blocking, gate can temporarily run in warning mode while drift is fixed in same release window.
Open Questions
- Should policy cache refresh be strictly restart-based, or do we want an operator-triggered cache-clear hook in production later?
- Do we want a single centralized governance artifact as source-of-truth long-term, with generated frontend/backend contract outputs?