feat: implement hybrid image extraction and memory management
Backend: - Add hybrid image extraction for Direct track (inline image blocks) - Add render_inline_image_regions() fallback when OCR doesn't find images - Add check_document_for_missing_images() for detecting missing images - Add memory management system (MemoryGuard, ModelManager, ServicePool) - Update pdf_generator_service to handle HYBRID processing track - Add ElementType.LOGO for logo extraction Frontend: - Fix PDF viewer re-rendering issues with memoization - Add TaskNotFound component and useTaskValidation hook - Disable StrictMode due to react-pdf incompatibility - Fix task detail and results page loading states 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -415,4 +415,173 @@ async def test_concurrent_load():
|
||||
### Phase 4: Hardening (Week 4)
|
||||
- Stress testing
|
||||
- Performance tuning
|
||||
- Documentation and monitoring
|
||||
- Documentation and monitoring
|
||||
|
||||
## Configuration Settings Reference
|
||||
|
||||
All memory management settings are defined in `backend/app/core/config.py` under the `Settings` class.
|
||||
|
||||
### Memory Thresholds
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| `memory_warning_threshold` | float | 0.80 | GPU memory usage ratio (0-1) to trigger warning alerts |
|
||||
| `memory_critical_threshold` | float | 0.95 | GPU memory ratio to start throttling operations |
|
||||
| `memory_emergency_threshold` | float | 0.98 | GPU memory ratio to trigger emergency cleanup |
|
||||
|
||||
### Memory Monitoring
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| `memory_check_interval_seconds` | int | 30 | Background check interval for memory monitoring |
|
||||
| `enable_memory_alerts` | bool | True | Enable/disable memory threshold alerts |
|
||||
| `gpu_memory_limit_mb` | int | 6144 | Maximum GPU memory to use (MB) |
|
||||
| `gpu_memory_reserve_mb` | int | 512 | Memory reserved for CUDA overhead |
|
||||
|
||||
### Model Lifecycle Management
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| `enable_model_lifecycle_management` | bool | True | Use ModelManager for model lifecycle |
|
||||
| `model_idle_timeout_seconds` | int | 300 | Unload models after idle time |
|
||||
| `pp_structure_idle_timeout_seconds` | int | 300 | Unload PP-StructureV3 after idle |
|
||||
| `structure_model_memory_mb` | int | 2000 | Estimated memory for PP-StructureV3 |
|
||||
| `ocr_model_memory_mb` | int | 500 | Estimated memory per OCR language model |
|
||||
| `enable_lazy_model_loading` | bool | True | Load models on demand |
|
||||
| `auto_unload_unused_models` | bool | True | Auto-unload unused language models |
|
||||
|
||||
### Service Pool Configuration
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| `enable_service_pool` | bool | True | Use OCRServicePool |
|
||||
| `max_services_per_device` | int | 1 | Max OCRService instances per GPU |
|
||||
| `max_total_services` | int | 2 | Max total OCRService instances |
|
||||
| `service_acquire_timeout_seconds` | float | 300.0 | Timeout for acquiring service from pool |
|
||||
| `max_queue_size` | int | 50 | Max pending tasks per device queue |
|
||||
|
||||
### Concurrency Control
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| `max_concurrent_predictions` | int | 2 | Max concurrent PP-StructureV3 predictions |
|
||||
| `max_concurrent_pages` | int | 2 | Max pages processed concurrently |
|
||||
| `inference_batch_size` | int | 1 | Batch size for inference |
|
||||
| `enable_batch_processing` | bool | True | Enable batch processing for large docs |
|
||||
|
||||
### Recovery Settings
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| `enable_cpu_fallback` | bool | True | Fall back to CPU when GPU memory low |
|
||||
| `enable_emergency_cleanup` | bool | True | Auto-cleanup on memory pressure |
|
||||
| `enable_worker_restart` | bool | False | Restart workers on OOM (requires supervisor) |
|
||||
|
||||
### Feature Flags
|
||||
|
||||
| Setting | Type | Default | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| `enable_chart_recognition` | bool | True | Enable chart/diagram recognition |
|
||||
| `enable_formula_recognition` | bool | True | Enable math formula recognition |
|
||||
| `enable_table_recognition` | bool | True | Enable table structure recognition |
|
||||
| `enable_seal_recognition` | bool | True | Enable seal/stamp recognition |
|
||||
| `enable_text_recognition` | bool | True | Enable general text recognition |
|
||||
| `enable_memory_optimization` | bool | True | Enable memory optimizations |
|
||||
|
||||
### Environment Variable Override
|
||||
|
||||
All settings can be overridden via environment variables. The format is uppercase with underscores:
|
||||
|
||||
```bash
|
||||
# Example .env file
|
||||
MEMORY_WARNING_THRESHOLD=0.75
|
||||
MEMORY_CRITICAL_THRESHOLD=0.90
|
||||
MAX_CONCURRENT_PREDICTIONS=1
|
||||
GPU_MEMORY_LIMIT_MB=4096
|
||||
ENABLE_CPU_FALLBACK=true
|
||||
```
|
||||
|
||||
### Recommended Configurations
|
||||
|
||||
#### RTX 4060 8GB (Default)
|
||||
```bash
|
||||
GPU_MEMORY_LIMIT_MB=6144
|
||||
MAX_CONCURRENT_PREDICTIONS=2
|
||||
MAX_CONCURRENT_PAGES=2
|
||||
INFERENCE_BATCH_SIZE=1
|
||||
```
|
||||
|
||||
#### RTX 3090 24GB
|
||||
```bash
|
||||
GPU_MEMORY_LIMIT_MB=20480
|
||||
MAX_CONCURRENT_PREDICTIONS=4
|
||||
MAX_CONCURRENT_PAGES=4
|
||||
INFERENCE_BATCH_SIZE=2
|
||||
```
|
||||
|
||||
#### CPU-Only Mode
|
||||
```bash
|
||||
FORCE_CPU_MODE=true
|
||||
MAX_CONCURRENT_PREDICTIONS=1
|
||||
ENABLE_CPU_FALLBACK=false
|
||||
```
|
||||
|
||||
## Prometheus Metrics
|
||||
|
||||
The system exports Prometheus-format metrics via the `PrometheusMetrics` class. Available metrics:
|
||||
|
||||
### GPU Metrics
|
||||
- `tool_ocr_memory_gpu_total_bytes` - Total GPU memory
|
||||
- `tool_ocr_memory_gpu_used_bytes` - Used GPU memory
|
||||
- `tool_ocr_memory_gpu_free_bytes` - Free GPU memory
|
||||
- `tool_ocr_memory_gpu_utilization_ratio` - GPU utilization (0-1)
|
||||
|
||||
### Model Metrics
|
||||
- `tool_ocr_memory_models_loaded_total` - Number of loaded models
|
||||
- `tool_ocr_memory_models_memory_bytes` - Total memory used by models
|
||||
- `tool_ocr_memory_model_ref_count{model_id}` - Reference count per model
|
||||
|
||||
### Prediction Metrics
|
||||
- `tool_ocr_memory_predictions_active` - Currently active predictions
|
||||
- `tool_ocr_memory_predictions_queue_depth` - Predictions waiting in queue
|
||||
- `tool_ocr_memory_predictions_total` - Total predictions processed (counter)
|
||||
- `tool_ocr_memory_predictions_timeouts_total` - Total prediction timeouts (counter)
|
||||
|
||||
### Pool Metrics
|
||||
- `tool_ocr_memory_pool_services_total` - Total services in pool
|
||||
- `tool_ocr_memory_pool_services_available` - Available services
|
||||
- `tool_ocr_memory_pool_services_in_use` - Services in use
|
||||
- `tool_ocr_memory_pool_acquisitions_total` - Total acquisitions (counter)
|
||||
|
||||
### Recovery Metrics
|
||||
- `tool_ocr_memory_recovery_count_total` - Total recovery attempts
|
||||
- `tool_ocr_memory_recovery_in_cooldown` - In cooldown (0/1)
|
||||
- `tool_ocr_memory_recovery_cooldown_remaining_seconds` - Remaining cooldown
|
||||
|
||||
## Memory Dump API
|
||||
|
||||
The `MemoryDumper` class provides debugging capabilities:
|
||||
|
||||
```python
|
||||
from app.services.memory_manager import get_memory_dumper
|
||||
|
||||
dumper = get_memory_dumper()
|
||||
|
||||
# Create a memory dump
|
||||
dump = dumper.create_dump(include_python_objects=True)
|
||||
|
||||
# Get dump as dictionary for JSON serialization
|
||||
dump_dict = dumper.to_dict(dump)
|
||||
|
||||
# Compare two dumps to detect memory growth
|
||||
comparison = dumper.compare_dumps(dump1, dump2)
|
||||
```
|
||||
|
||||
Memory dumps include:
|
||||
- GPU/CPU memory usage
|
||||
- Loaded models and reference counts
|
||||
- Active predictions and queue state
|
||||
- Service pool statistics
|
||||
- Recovery manager state
|
||||
- Python GC statistics
|
||||
- Large Python objects (optional)
|
||||
@@ -3,123 +3,123 @@
|
||||
## Section 1: Model Lifecycle Management (Priority: Critical)
|
||||
|
||||
### 1.1 Create ModelManager class
|
||||
- [ ] Design ModelManager interface with load/unload/get methods
|
||||
- [ ] Implement reference counting for model instances
|
||||
- [ ] Add idle timeout tracking with configurable thresholds
|
||||
- [ ] Create teardown() method for explicit cleanup
|
||||
- [ ] Add logging for model lifecycle events
|
||||
- [x] Design ModelManager interface with load/unload/get methods
|
||||
- [x] Implement reference counting for model instances
|
||||
- [x] Add idle timeout tracking with configurable thresholds
|
||||
- [x] Create teardown() method for explicit cleanup
|
||||
- [x] Add logging for model lifecycle events
|
||||
|
||||
### 1.2 Integrate PP-StructureV3 with ModelManager
|
||||
- [ ] Remove permanent exemption from unloading (lines 255-267)
|
||||
- [ ] Wrap PP-StructureV3 in managed model wrapper
|
||||
- [ ] Implement lazy loading on first access
|
||||
- [ ] Add unload capability with cache clearing
|
||||
- [ ] Test model reload after unload
|
||||
- [x] Remove permanent exemption from unloading (lines 255-267)
|
||||
- [x] Wrap PP-StructureV3 in managed model wrapper
|
||||
- [x] Implement lazy loading on first access
|
||||
- [x] Add unload capability with cache clearing
|
||||
- [x] Test model reload after unload
|
||||
|
||||
## Section 2: Service Singleton Pattern (Priority: Critical)
|
||||
|
||||
### 2.1 Create OCRServicePool
|
||||
- [ ] Design pool interface with acquire/release methods
|
||||
- [ ] Implement per-device instance management
|
||||
- [ ] Add queue-based task distribution
|
||||
- [ ] Implement concurrency limits via semaphores
|
||||
- [ ] Add health check for pooled instances
|
||||
- [x] Design pool interface with acquire/release methods
|
||||
- [x] Implement per-device instance management
|
||||
- [x] Add queue-based task distribution
|
||||
- [x] Implement concurrency limits via semaphores
|
||||
- [x] Add health check for pooled instances
|
||||
|
||||
### 2.2 Refactor task router
|
||||
- [ ] Replace OCRService() instantiation with pool.acquire()
|
||||
- [ ] Add proper release in finally blocks
|
||||
- [ ] Handle pool exhaustion gracefully
|
||||
- [ ] Add metrics for pool utilization
|
||||
- [ ] Update error handling for pooled services
|
||||
- [x] Replace OCRService() instantiation with pool.acquire()
|
||||
- [x] Add proper release in finally blocks
|
||||
- [x] Handle pool exhaustion gracefully
|
||||
- [x] Add metrics for pool utilization
|
||||
- [x] Update error handling for pooled services
|
||||
|
||||
## Section 3: Enhanced Memory Monitoring (Priority: High)
|
||||
|
||||
### 3.1 Create MemoryGuard class
|
||||
- [ ] Implement paddle.device.cuda memory queries
|
||||
- [ ] Add pynvml integration as fallback
|
||||
- [ ] Add torch memory query support
|
||||
- [ ] Create configurable threshold system
|
||||
- [ ] Implement memory prediction for operations
|
||||
- [x] Implement paddle.device.cuda memory queries
|
||||
- [x] Add pynvml integration as fallback
|
||||
- [x] Add torch memory query support
|
||||
- [x] Create configurable threshold system
|
||||
- [x] Implement memory prediction for operations
|
||||
|
||||
### 3.2 Integrate memory checks
|
||||
- [ ] Replace existing check_gpu_memory implementation
|
||||
- [ ] Add pre-operation memory checks
|
||||
- [ ] Implement CPU fallback when memory low
|
||||
- [ ] Add memory usage logging
|
||||
- [ ] Create memory pressure alerts
|
||||
- [x] Replace existing check_gpu_memory implementation
|
||||
- [x] Add pre-operation memory checks
|
||||
- [x] Implement CPU fallback when memory low
|
||||
- [x] Add memory usage logging
|
||||
- [x] Create memory pressure alerts
|
||||
|
||||
## Section 4: Concurrency Control (Priority: High)
|
||||
|
||||
### 4.1 Implement prediction semaphores
|
||||
- [ ] Add semaphore for PP-StructureV3.predict
|
||||
- [ ] Configure max concurrent predictions
|
||||
- [ ] Add queue for waiting predictions
|
||||
- [ ] Implement timeout handling
|
||||
- [ ] Add metrics for queue depth
|
||||
- [x] Add semaphore for PP-StructureV3.predict
|
||||
- [x] Configure max concurrent predictions
|
||||
- [x] Add queue for waiting predictions
|
||||
- [x] Implement timeout handling
|
||||
- [x] Add metrics for queue depth
|
||||
|
||||
### 4.2 Add selective processing
|
||||
- [ ] Create config for disabling chart/formula/table
|
||||
- [ ] Implement batch processing for large documents
|
||||
- [ ] Add progressive loading for multi-page docs
|
||||
- [ ] Create priority queue for operations
|
||||
- [ ] Test memory savings with selective processing
|
||||
- [x] Create config for disabling chart/formula/table
|
||||
- [x] Implement batch processing for large documents
|
||||
- [x] Add progressive loading for multi-page docs
|
||||
- [x] Create priority queue for operations
|
||||
- [x] Test memory savings with selective processing
|
||||
|
||||
## Section 5: Active Memory Management (Priority: Medium)
|
||||
|
||||
### 5.1 Create memory monitor thread
|
||||
- [ ] Implement background monitoring loop
|
||||
- [ ] Add periodic memory metrics collection
|
||||
- [ ] Create threshold-based triggers
|
||||
- [ ] Implement automatic cache clearing
|
||||
- [ ] Add LRU-based model unloading
|
||||
- [x] Implement background monitoring loop
|
||||
- [x] Add periodic memory metrics collection
|
||||
- [x] Create threshold-based triggers
|
||||
- [x] Implement automatic cache clearing
|
||||
- [x] Add LRU-based model unloading
|
||||
|
||||
### 5.2 Add recovery mechanisms
|
||||
- [ ] Implement emergency memory release
|
||||
- [ ] Add worker process restart capability
|
||||
- [ ] Create memory dump for debugging
|
||||
- [ ] Add cooldown period after recovery
|
||||
- [ ] Test recovery under various scenarios
|
||||
- [x] Implement emergency memory release
|
||||
- [x] Add worker process restart capability (RecoveryManager)
|
||||
- [x] Create memory dump for debugging
|
||||
- [x] Add cooldown period after recovery
|
||||
- [x] Test recovery under various scenarios
|
||||
|
||||
## Section 6: Cleanup Hooks (Priority: Medium)
|
||||
|
||||
### 6.1 Implement shutdown handlers
|
||||
- [ ] Add FastAPI shutdown event handler
|
||||
- [ ] Create signal handlers (SIGTERM, SIGINT)
|
||||
- [ ] Implement graceful model unloading
|
||||
- [ ] Add connection draining
|
||||
- [ ] Test shutdown sequence
|
||||
- [x] Add FastAPI shutdown event handler
|
||||
- [x] Create signal handlers (SIGTERM, SIGINT)
|
||||
- [x] Implement graceful model unloading
|
||||
- [x] Add connection draining
|
||||
- [x] Test shutdown sequence
|
||||
|
||||
### 6.2 Add task cleanup
|
||||
- [ ] Wrap background tasks with cleanup
|
||||
- [ ] Add success/failure callbacks
|
||||
- [ ] Implement resource release on completion
|
||||
- [ ] Add cleanup verification logging
|
||||
- [ ] Test cleanup in error scenarios
|
||||
- [x] Wrap background tasks with cleanup
|
||||
- [x] Add success/failure callbacks
|
||||
- [x] Implement resource release on completion
|
||||
- [x] Add cleanup verification logging
|
||||
- [x] Test cleanup in error scenarios
|
||||
|
||||
## Section 7: Configuration & Settings (Priority: Low)
|
||||
|
||||
### 7.1 Add memory settings to config
|
||||
- [ ] Define memory threshold parameters
|
||||
- [ ] Add model timeout settings
|
||||
- [ ] Configure pool sizes
|
||||
- [ ] Add feature flags for new behavior
|
||||
- [ ] Document all settings
|
||||
- [x] Define memory threshold parameters
|
||||
- [x] Add model timeout settings
|
||||
- [x] Configure pool sizes
|
||||
- [x] Add feature flags for new behavior
|
||||
- [x] Document all settings
|
||||
|
||||
### 7.2 Create monitoring dashboard
|
||||
- [ ] Add memory metrics endpoint
|
||||
- [ ] Create pool status endpoint
|
||||
- [ ] Add model lifecycle stats
|
||||
- [ ] Implement health check endpoint
|
||||
- [ ] Add Prometheus metrics export
|
||||
- [x] Add memory metrics endpoint
|
||||
- [x] Create pool status endpoint
|
||||
- [x] Add model lifecycle stats
|
||||
- [x] Implement health check endpoint
|
||||
- [x] Add Prometheus metrics export
|
||||
|
||||
## Section 8: Testing & Documentation (Priority: High)
|
||||
|
||||
### 8.1 Create comprehensive tests
|
||||
- [ ] Unit tests for ModelManager
|
||||
- [ ] Integration tests for OCRServicePool
|
||||
- [ ] Memory leak detection tests
|
||||
- [ ] Stress tests with concurrent requests
|
||||
- [ ] Performance benchmarks
|
||||
- [x] Unit tests for ModelManager
|
||||
- [x] Integration tests for OCRServicePool
|
||||
- [x] Memory leak detection tests
|
||||
- [x] Stress tests with concurrent requests
|
||||
- [x] Performance benchmarks
|
||||
|
||||
### 8.2 Documentation
|
||||
- [ ] Document memory management architecture
|
||||
@@ -131,5 +131,46 @@
|
||||
---
|
||||
|
||||
**Total Tasks**: 58
|
||||
**Estimated Effort**: 3-4 weeks
|
||||
**Critical Path**: Sections 1-2 must be completed first as they form the foundation
|
||||
**Completed**: 53
|
||||
**Remaining**: 5 (Section 8.2 Documentation only)
|
||||
**Progress**: ~91%
|
||||
|
||||
**Critical Path Status**: Sections 1-8.1 are completed (foundation, memory monitoring, prediction semaphores, batch processing, recovery, signal handlers, configuration, Prometheus metrics, and comprehensive tests in place)
|
||||
|
||||
## Implementation Summary
|
||||
|
||||
### Files Created
|
||||
- `backend/app/services/memory_manager.py` - ModelManager, MemoryGuard, MemoryConfig, PredictionSemaphore, BatchProcessor, ProgressiveLoader, PriorityOperationQueue, RecoveryManager
|
||||
- `backend/app/services/service_pool.py` - OCRServicePool, PoolConfig
|
||||
- `backend/tests/services/test_memory_manager.py` - Unit tests for memory management (57 tests)
|
||||
- `backend/tests/services/test_service_pool.py` - Unit tests for service pool (18 tests)
|
||||
- `backend/tests/services/test_ocr_memory_integration.py` - Integration tests for memory check patterns (10 tests)
|
||||
|
||||
### Files Modified
|
||||
- `backend/app/core/config.py` - Added memory management configuration settings
|
||||
- `backend/app/services/ocr_service.py` - Removed PP-StructureV3 exemption, added unload capability, integrated MemoryGuard for pre-operation checks and CPU fallback, added PredictionSemaphore for concurrent prediction control
|
||||
- `backend/app/services/pp_structure_enhanced.py` - Added PredictionSemaphore control for predict calls
|
||||
- `backend/app/routers/tasks.py` - Refactored to use service pool
|
||||
- `backend/app/main.py` - Added startup/shutdown handlers, signal handlers (SIGTERM/SIGINT), connection draining, recovery manager shutdown
|
||||
|
||||
### New Classes Added (Section 4.2-8)
|
||||
- `BatchProcessor` - Memory-aware batch processing for large documents with priority support
|
||||
- `ProgressiveLoader` - Progressive page loading with lookahead and automatic cleanup
|
||||
- `PriorityOperationQueue` - Priority queue with timeout and cancellation support
|
||||
- `RecoveryManager` - Memory recovery with cooldown period and attempt limits
|
||||
- `MemoryDumper` - Memory dump creation for debugging with history and comparison
|
||||
- `PrometheusMetrics` - Prometheus-format metrics export for monitoring
|
||||
- Signal handlers for graceful shutdown (SIGTERM, SIGINT)
|
||||
- Connection draining for clean shutdown
|
||||
|
||||
### New Test Classes Added (Section 8.1)
|
||||
- `TestModelReloadAfterUnload` - Tests for model reload after unload
|
||||
- `TestSelectiveProcessingMemorySavings` - Tests for memory savings with selective processing
|
||||
- `TestRecoveryScenarios` - Tests for recovery under various scenarios
|
||||
- `TestShutdownSequence` - Tests for shutdown sequence
|
||||
- `TestCleanupInErrorScenarios` - Tests for cleanup in error scenarios
|
||||
- `TestMemoryLeakDetection` - Tests for memory leak detection
|
||||
- `TestStressConcurrentRequests` - Stress tests with concurrent requests
|
||||
- `TestPerformanceBenchmarks` - Performance benchmark tests
|
||||
- `TestMemoryDumper` - Tests for MemoryDumper class
|
||||
- `TestPrometheusMetrics` - Tests for PrometheusMetrics class
|
||||
|
||||
Reference in New Issue
Block a user