Chunk failures in BatchQueryEngine were silently discarded — `has_partial_failure` was tracked
in Redis but never surfaced to the API response or frontend. Users could see incomplete data
without any warning. This commit closes the gap end-to-end:
Backend:
- Track failed chunk time ranges (`failed_ranges`) in batch engine progress metadata
- Add single retry for transient Oracle errors (timeout, connection) in `_execute_single_chunk`
- Read `get_batch_progress()` after merge but before `redis_clear_batch()` cleanup
- Inject `has_partial_failure`, `failed_chunk_count`, `failed_ranges` into API response meta
- Persist partial failure flag to independent Redis key with TTL aligned to data storage layer
- Add shared container-resolution policy module with wildcard/expansion guardrails
- Refactor reason filter from single-value to multi-select (`reason` → `reasons`)
Frontend:
- Add client-side date range validation (730-day limit) before API submission
- Display amber warning banner on partial failure with specific failed date ranges
- Support generic fallback message for container-mode queries without date ranges
- Update FilterPanel to support multi-select reason chips
Specs & tests:
- Create batch-query-resilience spec; update reject-history-api and reject-history-page specs
- Add 7 new tests for retry, memory guard, failed ranges, partial failure propagation, TTL
- Cross-service regression verified (hold, resource, job, msd — 411 tests pass)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two changes combined:
1. historical-query-slow-connection: Migrate all historical query pages
to read_sql_df_slow with semaphore concurrency control (max 3),
raise DB slow timeout to 300s, gunicorn timeout to 360s, and
unify frontend timeouts to 360s for all historical pages.
2. hold-resource-history-dataset-cache: Convert hold-history and
resource-history from multi-query to single-query + dataset cache
pattern (L1 ProcessLevelCache + L2 Redis parquet/base64, TTL=900s).
Replace old GET endpoints with POST /query + GET /view two-phase
API. Frontend auto-retries on 410 cache_expired.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>