feat(reject-history): fix silent data loss by propagating partial failure metadata to frontend
Chunk failures in BatchQueryEngine were silently discarded — `has_partial_failure` was tracked in Redis but never surfaced to the API response or frontend. Users could see incomplete data without any warning. This commit closes the gap end-to-end: Backend: - Track failed chunk time ranges (`failed_ranges`) in batch engine progress metadata - Add single retry for transient Oracle errors (timeout, connection) in `_execute_single_chunk` - Read `get_batch_progress()` after merge but before `redis_clear_batch()` cleanup - Inject `has_partial_failure`, `failed_chunk_count`, `failed_ranges` into API response meta - Persist partial failure flag to independent Redis key with TTL aligned to data storage layer - Add shared container-resolution policy module with wildcard/expansion guardrails - Refactor reason filter from single-value to multi-select (`reason` → `reasons`) Frontend: - Add client-side date range validation (730-day limit) before API submission - Display amber warning banner on partial failure with specific failed date ranges - Support generic fallback message for container-mode queries without date ranges - Update FilterPanel to support multi-select reason chips Specs & tests: - Create batch-query-resilience spec; update reject-history-api and reject-history-page specs - Add 7 new tests for retry, memory guard, failed ranges, partial failure propagation, TTL - Cross-service regression verified (hold, resource, job, msd — 411 tests pass) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
22
.env.example
22
.env.example
@@ -59,6 +59,16 @@ QUERY_TOOL_MAX_CONTAINER_IDS=200
|
||||
RESOURCE_DETAIL_DEFAULT_LIMIT=500
|
||||
RESOURCE_DETAIL_MAX_LIMIT=500
|
||||
|
||||
# Shared container-resolution guardrails
|
||||
# 0 = disable raw input count cap (recommended: rely on expansion limits instead)
|
||||
CONTAINER_RESOLVE_INPUT_MAX_VALUES=0
|
||||
# Wildcard pattern must include this many literal-prefix chars before %/_ (e.g., GA%)
|
||||
CONTAINER_RESOLVE_PATTERN_MIN_PREFIX_LEN=4
|
||||
# Per-token expansion guard (avoid one wildcard exploding into too many container IDs)
|
||||
CONTAINER_RESOLVE_MAX_EXPANSION_PER_TOKEN=2000
|
||||
# Total resolved container-ID guard for a single resolve request
|
||||
CONTAINER_RESOLVE_MAX_CONTAINER_IDS=30000
|
||||
|
||||
# Trust boundary for forwarded headers (safe default: false)
|
||||
# Direct-exposure deployment (no reverse proxy): keep this false
|
||||
TRUST_PROXY_HEADERS=false
|
||||
@@ -101,14 +111,14 @@ GUNICORN_WORKERS=2
|
||||
GUNICORN_THREADS=4
|
||||
|
||||
# Worker timeout (seconds): should stay above DB/query-tool slow paths
|
||||
GUNICORN_TIMEOUT=130
|
||||
GUNICORN_TIMEOUT=360
|
||||
|
||||
# Graceful shutdown timeout for worker reloads (seconds)
|
||||
GUNICORN_GRACEFUL_TIMEOUT=60
|
||||
GUNICORN_GRACEFUL_TIMEOUT=300
|
||||
|
||||
# Worker recycle policy (set 0 to disable)
|
||||
GUNICORN_MAX_REQUESTS=5000
|
||||
GUNICORN_MAX_REQUESTS_JITTER=500
|
||||
GUNICORN_MAX_REQUESTS=1200
|
||||
GUNICORN_MAX_REQUESTS_JITTER=300
|
||||
|
||||
# ============================================================
|
||||
# Redis Configuration (for WIP cache)
|
||||
@@ -201,6 +211,8 @@ TRACE_EVENTS_MAX_WORKERS=2
|
||||
# Max parallel workers for EventFetcher batch queries (per domain)
|
||||
# Recommend: 2 (peak concurrent slow queries = TRACE_EVENTS_MAX_WORKERS × this)
|
||||
EVENT_FETCHER_MAX_WORKERS=2
|
||||
# false = any failed batch raises error (avoid silent partial data)
|
||||
EVENT_FETCHER_ALLOW_PARTIAL_RESULTS=false
|
||||
|
||||
# Max parallel workers for forward pipeline WIP+rejects fetching
|
||||
FORWARD_PIPELINE_MAX_WORKERS=2
|
||||
@@ -351,7 +363,7 @@ REJECT_ENGINE_SPOOL_CLEANUP_INTERVAL_SECONDS=300
|
||||
REJECT_ENGINE_SPOOL_ORPHAN_GRACE_SECONDS=600
|
||||
|
||||
# Batch query engine thresholds
|
||||
BATCH_QUERY_TIME_THRESHOLD_DAYS=60
|
||||
BATCH_QUERY_TIME_THRESHOLD_DAYS=10
|
||||
BATCH_QUERY_ID_THRESHOLD=1000
|
||||
BATCH_CHUNK_MAX_MEMORY_MB=256
|
||||
|
||||
|
||||
Reference in New Issue
Block a user