Archive incomplete proposal for later continuation. OCR processing has known quality issues to be addressed in future work. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
177 lines
7.2 KiB
Markdown
177 lines
7.2 KiB
Markdown
# Tasks for Enhanced Memory Management
|
|
|
|
## Section 1: Model Lifecycle Management (Priority: Critical)
|
|
|
|
### 1.1 Create ModelManager class
|
|
- [x] Design ModelManager interface with load/unload/get methods
|
|
- [x] Implement reference counting for model instances
|
|
- [x] Add idle timeout tracking with configurable thresholds
|
|
- [x] Create teardown() method for explicit cleanup
|
|
- [x] Add logging for model lifecycle events
|
|
|
|
### 1.2 Integrate PP-StructureV3 with ModelManager
|
|
- [x] Remove permanent exemption from unloading (lines 255-267)
|
|
- [x] Wrap PP-StructureV3 in managed model wrapper
|
|
- [x] Implement lazy loading on first access
|
|
- [x] Add unload capability with cache clearing
|
|
- [x] Test model reload after unload
|
|
|
|
## Section 2: Service Singleton Pattern (Priority: Critical)
|
|
|
|
### 2.1 Create OCRServicePool
|
|
- [x] Design pool interface with acquire/release methods
|
|
- [x] Implement per-device instance management
|
|
- [x] Add queue-based task distribution
|
|
- [x] Implement concurrency limits via semaphores
|
|
- [x] Add health check for pooled instances
|
|
|
|
### 2.2 Refactor task router
|
|
- [x] Replace OCRService() instantiation with pool.acquire()
|
|
- [x] Add proper release in finally blocks
|
|
- [x] Handle pool exhaustion gracefully
|
|
- [x] Add metrics for pool utilization
|
|
- [x] Update error handling for pooled services
|
|
|
|
## Section 3: Enhanced Memory Monitoring (Priority: High)
|
|
|
|
### 3.1 Create MemoryGuard class
|
|
- [x] Implement paddle.device.cuda memory queries
|
|
- [x] Add pynvml integration as fallback
|
|
- [x] Add torch memory query support
|
|
- [x] Create configurable threshold system
|
|
- [x] Implement memory prediction for operations
|
|
|
|
### 3.2 Integrate memory checks
|
|
- [x] Replace existing check_gpu_memory implementation
|
|
- [x] Add pre-operation memory checks
|
|
- [x] Implement CPU fallback when memory low
|
|
- [x] Add memory usage logging
|
|
- [x] Create memory pressure alerts
|
|
|
|
## Section 4: Concurrency Control (Priority: High)
|
|
|
|
### 4.1 Implement prediction semaphores
|
|
- [x] Add semaphore for PP-StructureV3.predict
|
|
- [x] Configure max concurrent predictions
|
|
- [x] Add queue for waiting predictions
|
|
- [x] Implement timeout handling
|
|
- [x] Add metrics for queue depth
|
|
|
|
### 4.2 Add selective processing
|
|
- [x] Create config for disabling chart/formula/table
|
|
- [x] Implement batch processing for large documents
|
|
- [x] Add progressive loading for multi-page docs
|
|
- [x] Create priority queue for operations
|
|
- [x] Test memory savings with selective processing
|
|
|
|
## Section 5: Active Memory Management (Priority: Medium)
|
|
|
|
### 5.1 Create memory monitor thread
|
|
- [x] Implement background monitoring loop
|
|
- [x] Add periodic memory metrics collection
|
|
- [x] Create threshold-based triggers
|
|
- [x] Implement automatic cache clearing
|
|
- [x] Add LRU-based model unloading
|
|
|
|
### 5.2 Add recovery mechanisms
|
|
- [x] Implement emergency memory release
|
|
- [x] Add worker process restart capability (RecoveryManager)
|
|
- [x] Create memory dump for debugging
|
|
- [x] Add cooldown period after recovery
|
|
- [x] Test recovery under various scenarios
|
|
|
|
## Section 6: Cleanup Hooks (Priority: Medium)
|
|
|
|
### 6.1 Implement shutdown handlers
|
|
- [x] Add FastAPI shutdown event handler
|
|
- [x] Create signal handlers (SIGTERM, SIGINT)
|
|
- [x] Implement graceful model unloading
|
|
- [x] Add connection draining
|
|
- [x] Test shutdown sequence
|
|
|
|
### 6.2 Add task cleanup
|
|
- [x] Wrap background tasks with cleanup
|
|
- [x] Add success/failure callbacks
|
|
- [x] Implement resource release on completion
|
|
- [x] Add cleanup verification logging
|
|
- [x] Test cleanup in error scenarios
|
|
|
|
## Section 7: Configuration & Settings (Priority: Low)
|
|
|
|
### 7.1 Add memory settings to config
|
|
- [x] Define memory threshold parameters
|
|
- [x] Add model timeout settings
|
|
- [x] Configure pool sizes
|
|
- [x] Add feature flags for new behavior
|
|
- [x] Document all settings
|
|
|
|
### 7.2 Create monitoring dashboard
|
|
- [x] Add memory metrics endpoint
|
|
- [x] Create pool status endpoint
|
|
- [x] Add model lifecycle stats
|
|
- [x] Implement health check endpoint
|
|
- [x] Add Prometheus metrics export
|
|
|
|
## Section 8: Testing & Documentation (Priority: High)
|
|
|
|
### 8.1 Create comprehensive tests
|
|
- [x] Unit tests for ModelManager
|
|
- [x] Integration tests for OCRServicePool
|
|
- [x] Memory leak detection tests
|
|
- [x] Stress tests with concurrent requests
|
|
- [x] Performance benchmarks
|
|
|
|
### 8.2 Documentation
|
|
- [ ] Document memory management architecture
|
|
- [ ] Create tuning guide
|
|
- [ ] Add troubleshooting section
|
|
- [ ] Document monitoring setup
|
|
- [ ] Create migration guide
|
|
|
|
---
|
|
|
|
**Total Tasks**: 58
|
|
**Completed**: 53
|
|
**Remaining**: 5 (Section 8.2 Documentation only)
|
|
**Progress**: ~91%
|
|
|
|
**Critical Path Status**: Sections 1-8.1 are completed (foundation, memory monitoring, prediction semaphores, batch processing, recovery, signal handlers, configuration, Prometheus metrics, and comprehensive tests in place)
|
|
|
|
## Implementation Summary
|
|
|
|
### Files Created
|
|
- `backend/app/services/memory_manager.py` - ModelManager, MemoryGuard, MemoryConfig, PredictionSemaphore, BatchProcessor, ProgressiveLoader, PriorityOperationQueue, RecoveryManager
|
|
- `backend/app/services/service_pool.py` - OCRServicePool, PoolConfig
|
|
- `backend/tests/services/test_memory_manager.py` - Unit tests for memory management (57 tests)
|
|
- `backend/tests/services/test_service_pool.py` - Unit tests for service pool (18 tests)
|
|
- `backend/tests/services/test_ocr_memory_integration.py` - Integration tests for memory check patterns (10 tests)
|
|
|
|
### Files Modified
|
|
- `backend/app/core/config.py` - Added memory management configuration settings
|
|
- `backend/app/services/ocr_service.py` - Removed PP-StructureV3 exemption, added unload capability, integrated MemoryGuard for pre-operation checks and CPU fallback, added PredictionSemaphore for concurrent prediction control
|
|
- `backend/app/services/pp_structure_enhanced.py` - Added PredictionSemaphore control for predict calls
|
|
- `backend/app/routers/tasks.py` - Refactored to use service pool
|
|
- `backend/app/main.py` - Added startup/shutdown handlers, signal handlers (SIGTERM/SIGINT), connection draining, recovery manager shutdown
|
|
|
|
### New Classes Added (Section 4.2-8)
|
|
- `BatchProcessor` - Memory-aware batch processing for large documents with priority support
|
|
- `ProgressiveLoader` - Progressive page loading with lookahead and automatic cleanup
|
|
- `PriorityOperationQueue` - Priority queue with timeout and cancellation support
|
|
- `RecoveryManager` - Memory recovery with cooldown period and attempt limits
|
|
- `MemoryDumper` - Memory dump creation for debugging with history and comparison
|
|
- `PrometheusMetrics` - Prometheus-format metrics export for monitoring
|
|
- Signal handlers for graceful shutdown (SIGTERM, SIGINT)
|
|
- Connection draining for clean shutdown
|
|
|
|
### New Test Classes Added (Section 8.1)
|
|
- `TestModelReloadAfterUnload` - Tests for model reload after unload
|
|
- `TestSelectiveProcessingMemorySavings` - Tests for memory savings with selective processing
|
|
- `TestRecoveryScenarios` - Tests for recovery under various scenarios
|
|
- `TestShutdownSequence` - Tests for shutdown sequence
|
|
- `TestCleanupInErrorScenarios` - Tests for cleanup in error scenarios
|
|
- `TestMemoryLeakDetection` - Tests for memory leak detection
|
|
- `TestStressConcurrentRequests` - Stress tests with concurrent requests
|
|
- `TestPerformanceBenchmarks` - Performance benchmark tests
|
|
- `TestMemoryDumper` - Tests for MemoryDumper class
|
|
- `TestPrometheusMetrics` - Tests for PrometheusMetrics class
|