# Tasks for Enhanced Memory Management ## Section 1: Model Lifecycle Management (Priority: Critical) ### 1.1 Create ModelManager class - [x] Design ModelManager interface with load/unload/get methods - [x] Implement reference counting for model instances - [x] Add idle timeout tracking with configurable thresholds - [x] Create teardown() method for explicit cleanup - [x] Add logging for model lifecycle events ### 1.2 Integrate PP-StructureV3 with ModelManager - [x] Remove permanent exemption from unloading (lines 255-267) - [x] Wrap PP-StructureV3 in managed model wrapper - [x] Implement lazy loading on first access - [x] Add unload capability with cache clearing - [x] Test model reload after unload ## Section 2: Service Singleton Pattern (Priority: Critical) ### 2.1 Create OCRServicePool - [x] Design pool interface with acquire/release methods - [x] Implement per-device instance management - [x] Add queue-based task distribution - [x] Implement concurrency limits via semaphores - [x] Add health check for pooled instances ### 2.2 Refactor task router - [x] Replace OCRService() instantiation with pool.acquire() - [x] Add proper release in finally blocks - [x] Handle pool exhaustion gracefully - [x] Add metrics for pool utilization - [x] Update error handling for pooled services ## Section 3: Enhanced Memory Monitoring (Priority: High) ### 3.1 Create MemoryGuard class - [x] Implement paddle.device.cuda memory queries - [x] Add pynvml integration as fallback - [x] Add torch memory query support - [x] Create configurable threshold system - [x] Implement memory prediction for operations ### 3.2 Integrate memory checks - [x] Replace existing check_gpu_memory implementation - [x] Add pre-operation memory checks - [x] Implement CPU fallback when memory low - [x] Add memory usage logging - [x] Create memory pressure alerts ## Section 4: Concurrency Control (Priority: High) ### 4.1 Implement prediction semaphores - [x] Add semaphore for PP-StructureV3.predict - [x] Configure max concurrent predictions - [x] Add queue for waiting predictions - [x] Implement timeout handling - [x] Add metrics for queue depth ### 4.2 Add selective processing - [x] Create config for disabling chart/formula/table - [x] Implement batch processing for large documents - [x] Add progressive loading for multi-page docs - [x] Create priority queue for operations - [x] Test memory savings with selective processing ## Section 5: Active Memory Management (Priority: Medium) ### 5.1 Create memory monitor thread - [x] Implement background monitoring loop - [x] Add periodic memory metrics collection - [x] Create threshold-based triggers - [x] Implement automatic cache clearing - [x] Add LRU-based model unloading ### 5.2 Add recovery mechanisms - [x] Implement emergency memory release - [x] Add worker process restart capability (RecoveryManager) - [x] Create memory dump for debugging - [x] Add cooldown period after recovery - [x] Test recovery under various scenarios ## Section 6: Cleanup Hooks (Priority: Medium) ### 6.1 Implement shutdown handlers - [x] Add FastAPI shutdown event handler - [x] Create signal handlers (SIGTERM, SIGINT) - [x] Implement graceful model unloading - [x] Add connection draining - [x] Test shutdown sequence ### 6.2 Add task cleanup - [x] Wrap background tasks with cleanup - [x] Add success/failure callbacks - [x] Implement resource release on completion - [x] Add cleanup verification logging - [x] Test cleanup in error scenarios ## Section 7: Configuration & Settings (Priority: Low) ### 7.1 Add memory settings to config - [x] Define memory threshold parameters - [x] Add model timeout settings - [x] Configure pool sizes - [x] Add feature flags for new behavior - [x] Document all settings ### 7.2 Create monitoring dashboard - [x] Add memory metrics endpoint - [x] Create pool status endpoint - [x] Add model lifecycle stats - [x] Implement health check endpoint - [x] Add Prometheus metrics export ## Section 8: Testing & Documentation (Priority: High) ### 8.1 Create comprehensive tests - [x] Unit tests for ModelManager - [x] Integration tests for OCRServicePool - [x] Memory leak detection tests - [x] Stress tests with concurrent requests - [x] Performance benchmarks ### 8.2 Documentation - [ ] Document memory management architecture - [ ] Create tuning guide - [ ] Add troubleshooting section - [ ] Document monitoring setup - [ ] Create migration guide --- **Total Tasks**: 58 **Completed**: 53 **Remaining**: 5 (Section 8.2 Documentation only) **Progress**: ~91% **Critical Path Status**: Sections 1-8.1 are completed (foundation, memory monitoring, prediction semaphores, batch processing, recovery, signal handlers, configuration, Prometheus metrics, and comprehensive tests in place) ## Implementation Summary ### Files Created - `backend/app/services/memory_manager.py` - ModelManager, MemoryGuard, MemoryConfig, PredictionSemaphore, BatchProcessor, ProgressiveLoader, PriorityOperationQueue, RecoveryManager - `backend/app/services/service_pool.py` - OCRServicePool, PoolConfig - `backend/tests/services/test_memory_manager.py` - Unit tests for memory management (57 tests) - `backend/tests/services/test_service_pool.py` - Unit tests for service pool (18 tests) - `backend/tests/services/test_ocr_memory_integration.py` - Integration tests for memory check patterns (10 tests) ### Files Modified - `backend/app/core/config.py` - Added memory management configuration settings - `backend/app/services/ocr_service.py` - Removed PP-StructureV3 exemption, added unload capability, integrated MemoryGuard for pre-operation checks and CPU fallback, added PredictionSemaphore for concurrent prediction control - `backend/app/services/pp_structure_enhanced.py` - Added PredictionSemaphore control for predict calls - `backend/app/routers/tasks.py` - Refactored to use service pool - `backend/app/main.py` - Added startup/shutdown handlers, signal handlers (SIGTERM/SIGINT), connection draining, recovery manager shutdown ### New Classes Added (Section 4.2-8) - `BatchProcessor` - Memory-aware batch processing for large documents with priority support - `ProgressiveLoader` - Progressive page loading with lookahead and automatic cleanup - `PriorityOperationQueue` - Priority queue with timeout and cancellation support - `RecoveryManager` - Memory recovery with cooldown period and attempt limits - `MemoryDumper` - Memory dump creation for debugging with history and comparison - `PrometheusMetrics` - Prometheus-format metrics export for monitoring - Signal handlers for graceful shutdown (SIGTERM, SIGINT) - Connection draining for clean shutdown ### New Test Classes Added (Section 8.1) - `TestModelReloadAfterUnload` - Tests for model reload after unload - `TestSelectiveProcessingMemorySavings` - Tests for memory savings with selective processing - `TestRecoveryScenarios` - Tests for recovery under various scenarios - `TestShutdownSequence` - Tests for shutdown sequence - `TestCleanupInErrorScenarios` - Tests for cleanup in error scenarios - `TestMemoryLeakDetection` - Tests for memory leak detection - `TestStressConcurrentRequests` - Stress tests with concurrent requests - `TestPerformanceBenchmarks` - Performance benchmark tests - `TestMemoryDumper` - Tests for MemoryDumper class - `TestPrometheusMetrics` - Tests for PrometheusMetrics class