# Tasks for Enhanced Memory Management

## Section 1: Model Lifecycle Management (Priority: Critical)

### 1.1 Create ModelManager class
- [x] Design ModelManager interface with load/unload/get methods
- [x] Implement reference counting for model instances
- [x] Add idle timeout tracking with configurable thresholds
- [x] Create teardown() method for explicit cleanup
- [x] Add logging for model lifecycle events

### 1.2 Integrate PP-StructureV3 with ModelManager
- [x] Remove permanent exemption from unloading (lines 255-267)
- [x] Wrap PP-StructureV3 in managed model wrapper
- [x] Implement lazy loading on first access
- [x] Add unload capability with cache clearing
- [x] Test model reload after unload

## Section 2: Service Singleton Pattern (Priority: Critical)

### 2.1 Create OCRServicePool
- [x] Design pool interface with acquire/release methods
- [x] Implement per-device instance management
- [x] Add queue-based task distribution
- [x] Implement concurrency limits via semaphores
- [x] Add health check for pooled instances

### 2.2 Refactor task router
- [x] Replace OCRService() instantiation with pool.acquire()
- [x] Add proper release in finally blocks
- [x] Handle pool exhaustion gracefully
- [x] Add metrics for pool utilization
- [x] Update error handling for pooled services

## Section 3: Enhanced Memory Monitoring (Priority: High)

### 3.1 Create MemoryGuard class
- [x] Implement paddle.device.cuda memory queries
- [x] Add pynvml integration as fallback
- [x] Add torch memory query support
- [x] Create configurable threshold system
- [x] Implement memory prediction for operations

### 3.2 Integrate memory checks
- [x] Replace existing check_gpu_memory implementation
- [x] Add pre-operation memory checks
- [x] Implement CPU fallback when memory low
- [x] Add memory usage logging
- [x] Create memory pressure alerts

## Section 4: Concurrency Control (Priority: High)

### 4.1 Implement prediction semaphores
- [x] Add semaphore for PP-StructureV3.predict
- [x] Configure max concurrent predictions
- [x] Add queue for waiting predictions
- [x] Implement timeout handling
- [x] Add metrics for queue depth

### 4.2 Add selective processing
- [x] Create config for disabling chart/formula/table
- [x] Implement batch processing for large documents
- [x] Add progressive loading for multi-page docs
- [x] Create priority queue for operations
- [x] Test memory savings with selective processing

## Section 5: Active Memory Management (Priority: Medium)

### 5.1 Create memory monitor thread
- [x] Implement background monitoring loop
- [x] Add periodic memory metrics collection
- [x] Create threshold-based triggers
- [x] Implement automatic cache clearing
- [x] Add LRU-based model unloading

### 5.2 Add recovery mechanisms
- [x] Implement emergency memory release
- [x] Add worker process restart capability (RecoveryManager)
- [x] Create memory dump for debugging
- [x] Add cooldown period after recovery
- [x] Test recovery under various scenarios

## Section 6: Cleanup Hooks (Priority: Medium)

### 6.1 Implement shutdown handlers
- [x] Add FastAPI shutdown event handler
- [x] Create signal handlers (SIGTERM, SIGINT)
- [x] Implement graceful model unloading
- [x] Add connection draining
- [x] Test shutdown sequence

### 6.2 Add task cleanup
- [x] Wrap background tasks with cleanup
- [x] Add success/failure callbacks
- [x] Implement resource release on completion
- [x] Add cleanup verification logging
- [x] Test cleanup in error scenarios

## Section 7: Configuration & Settings (Priority: Low)

### 7.1 Add memory settings to config
- [x] Define memory threshold parameters
- [x] Add model timeout settings
- [x] Configure pool sizes
- [x] Add feature flags for new behavior
- [x] Document all settings

### 7.2 Create monitoring dashboard
- [x] Add memory metrics endpoint
- [x] Create pool status endpoint
- [x] Add model lifecycle stats
- [x] Implement health check endpoint
- [x] Add Prometheus metrics export

## Section 8: Testing & Documentation (Priority: High)

### 8.1 Create comprehensive tests
- [x] Unit tests for ModelManager
- [x] Integration tests for OCRServicePool
- [x] Memory leak detection tests
- [x] Stress tests with concurrent requests
- [x] Performance benchmarks

### 8.2 Documentation
- [ ] Document memory management architecture
- [ ] Create tuning guide
- [ ] Add troubleshooting section
- [ ] Document monitoring setup
- [ ] Create migration guide

---

**Total Tasks**: 58
**Completed**: 53
**Remaining**: 5 (Section 8.2 Documentation only)
**Progress**: ~91%

**Critical Path Status**: Sections 1-8.1 are completed (foundation, memory monitoring, prediction semaphores, batch processing, recovery, signal handlers, configuration, Prometheus metrics, and comprehensive tests in place)

## Implementation Summary

### Files Created
- `backend/app/services/memory_manager.py` - ModelManager, MemoryGuard, MemoryConfig, PredictionSemaphore, BatchProcessor, ProgressiveLoader, PriorityOperationQueue, RecoveryManager
- `backend/app/services/service_pool.py` - OCRServicePool, PoolConfig
- `backend/tests/services/test_memory_manager.py` - Unit tests for memory management (57 tests)
- `backend/tests/services/test_service_pool.py` - Unit tests for service pool (18 tests)
- `backend/tests/services/test_ocr_memory_integration.py` - Integration tests for memory check patterns (10 tests)

### Files Modified
- `backend/app/core/config.py` - Added memory management configuration settings
- `backend/app/services/ocr_service.py` - Removed PP-StructureV3 exemption, added unload capability, integrated MemoryGuard for pre-operation checks and CPU fallback, added PredictionSemaphore for concurrent prediction control
- `backend/app/services/pp_structure_enhanced.py` - Added PredictionSemaphore control for predict calls
- `backend/app/routers/tasks.py` - Refactored to use service pool
- `backend/app/main.py` - Added startup/shutdown handlers, signal handlers (SIGTERM/SIGINT), connection draining, recovery manager shutdown

### New Classes Added (Section 4.2-8)
- `BatchProcessor` - Memory-aware batch processing for large documents with priority support
- `ProgressiveLoader` - Progressive page loading with lookahead and automatic cleanup
- `PriorityOperationQueue` - Priority queue with timeout and cancellation support
- `RecoveryManager` - Memory recovery with cooldown period and attempt limits
- `MemoryDumper` - Memory dump creation for debugging with history and comparison
- `PrometheusMetrics` - Prometheus-format metrics export for monitoring
- Signal handlers for graceful shutdown (SIGTERM, SIGINT)
- Connection draining for clean shutdown

### New Test Classes Added (Section 8.1)
- `TestModelReloadAfterUnload` - Tests for model reload after unload
- `TestSelectiveProcessingMemorySavings` - Tests for memory savings with selective processing
- `TestRecoveryScenarios` - Tests for recovery under various scenarios
- `TestShutdownSequence` - Tests for shutdown sequence
- `TestCleanupInErrorScenarios` - Tests for cleanup in error scenarios
- `TestMemoryLeakDetection` - Tests for memory leak detection
- `TestStressConcurrentRequests` - Stress tests with concurrent requests
- `TestPerformanceBenchmarks` - Performance benchmark tests
- `TestMemoryDumper` - Tests for MemoryDumper class
- `TestPrometheusMetrics` - Tests for PrometheusMetrics class