- Create comprehensive proposal addressing OOM crashes and memory leaks - Define 6 core areas: model lifecycle, service pooling, monitoring - Add 58 implementation tasks across 8 sections - Design ModelManager with reference counting and idle timeout - Plan OCRServicePool for singleton service pattern - Specify MemoryGuard for proactive memory monitoring - Include concurrency controls and cleanup hooks - Add spec deltas for ocr-processing and task-management - Create detailed design document with architecture diagrams - Define performance targets: 75% memory reduction, 4x concurrency Critical improvements: - Remove PP-StructureV3 permanent exemption from unloading - Replace per-task OCRService instantiation with pooling - Add real GPU memory monitoring (currently always returns True) - Implement semaphore-based concurrency limits - Add proper resource cleanup on task completion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
135 lines
4.4 KiB
Markdown
135 lines
4.4 KiB
Markdown
# Tasks for Enhanced Memory Management
|
|
|
|
## Section 1: Model Lifecycle Management (Priority: Critical)
|
|
|
|
### 1.1 Create ModelManager class
|
|
- [ ] Design ModelManager interface with load/unload/get methods
|
|
- [ ] Implement reference counting for model instances
|
|
- [ ] Add idle timeout tracking with configurable thresholds
|
|
- [ ] Create teardown() method for explicit cleanup
|
|
- [ ] Add logging for model lifecycle events
|
|
|
|
### 1.2 Integrate PP-StructureV3 with ModelManager
|
|
- [ ] Remove permanent exemption from unloading (lines 255-267)
|
|
- [ ] Wrap PP-StructureV3 in managed model wrapper
|
|
- [ ] Implement lazy loading on first access
|
|
- [ ] Add unload capability with cache clearing
|
|
- [ ] Test model reload after unload
|
|
|
|
## Section 2: Service Singleton Pattern (Priority: Critical)
|
|
|
|
### 2.1 Create OCRServicePool
|
|
- [ ] Design pool interface with acquire/release methods
|
|
- [ ] Implement per-device instance management
|
|
- [ ] Add queue-based task distribution
|
|
- [ ] Implement concurrency limits via semaphores
|
|
- [ ] Add health check for pooled instances
|
|
|
|
### 2.2 Refactor task router
|
|
- [ ] Replace OCRService() instantiation with pool.acquire()
|
|
- [ ] Add proper release in finally blocks
|
|
- [ ] Handle pool exhaustion gracefully
|
|
- [ ] Add metrics for pool utilization
|
|
- [ ] Update error handling for pooled services
|
|
|
|
## Section 3: Enhanced Memory Monitoring (Priority: High)
|
|
|
|
### 3.1 Create MemoryGuard class
|
|
- [ ] Implement paddle.device.cuda memory queries
|
|
- [ ] Add pynvml integration as fallback
|
|
- [ ] Add torch memory query support
|
|
- [ ] Create configurable threshold system
|
|
- [ ] Implement memory prediction for operations
|
|
|
|
### 3.2 Integrate memory checks
|
|
- [ ] Replace existing check_gpu_memory implementation
|
|
- [ ] Add pre-operation memory checks
|
|
- [ ] Implement CPU fallback when memory low
|
|
- [ ] Add memory usage logging
|
|
- [ ] Create memory pressure alerts
|
|
|
|
## Section 4: Concurrency Control (Priority: High)
|
|
|
|
### 4.1 Implement prediction semaphores
|
|
- [ ] Add semaphore for PP-StructureV3.predict
|
|
- [ ] Configure max concurrent predictions
|
|
- [ ] Add queue for waiting predictions
|
|
- [ ] Implement timeout handling
|
|
- [ ] Add metrics for queue depth
|
|
|
|
### 4.2 Add selective processing
|
|
- [ ] Create config for disabling chart/formula/table
|
|
- [ ] Implement batch processing for large documents
|
|
- [ ] Add progressive loading for multi-page docs
|
|
- [ ] Create priority queue for operations
|
|
- [ ] Test memory savings with selective processing
|
|
|
|
## Section 5: Active Memory Management (Priority: Medium)
|
|
|
|
### 5.1 Create memory monitor thread
|
|
- [ ] Implement background monitoring loop
|
|
- [ ] Add periodic memory metrics collection
|
|
- [ ] Create threshold-based triggers
|
|
- [ ] Implement automatic cache clearing
|
|
- [ ] Add LRU-based model unloading
|
|
|
|
### 5.2 Add recovery mechanisms
|
|
- [ ] Implement emergency memory release
|
|
- [ ] Add worker process restart capability
|
|
- [ ] Create memory dump for debugging
|
|
- [ ] Add cooldown period after recovery
|
|
- [ ] Test recovery under various scenarios
|
|
|
|
## Section 6: Cleanup Hooks (Priority: Medium)
|
|
|
|
### 6.1 Implement shutdown handlers
|
|
- [ ] Add FastAPI shutdown event handler
|
|
- [ ] Create signal handlers (SIGTERM, SIGINT)
|
|
- [ ] Implement graceful model unloading
|
|
- [ ] Add connection draining
|
|
- [ ] Test shutdown sequence
|
|
|
|
### 6.2 Add task cleanup
|
|
- [ ] Wrap background tasks with cleanup
|
|
- [ ] Add success/failure callbacks
|
|
- [ ] Implement resource release on completion
|
|
- [ ] Add cleanup verification logging
|
|
- [ ] Test cleanup in error scenarios
|
|
|
|
## Section 7: Configuration & Settings (Priority: Low)
|
|
|
|
### 7.1 Add memory settings to config
|
|
- [ ] Define memory threshold parameters
|
|
- [ ] Add model timeout settings
|
|
- [ ] Configure pool sizes
|
|
- [ ] Add feature flags for new behavior
|
|
- [ ] Document all settings
|
|
|
|
### 7.2 Create monitoring dashboard
|
|
- [ ] Add memory metrics endpoint
|
|
- [ ] Create pool status endpoint
|
|
- [ ] Add model lifecycle stats
|
|
- [ ] Implement health check endpoint
|
|
- [ ] Add Prometheus metrics export
|
|
|
|
## Section 8: Testing & Documentation (Priority: High)
|
|
|
|
### 8.1 Create comprehensive tests
|
|
- [ ] Unit tests for ModelManager
|
|
- [ ] Integration tests for OCRServicePool
|
|
- [ ] Memory leak detection tests
|
|
- [ ] Stress tests with concurrent requests
|
|
- [ ] Performance benchmarks
|
|
|
|
### 8.2 Documentation
|
|
- [ ] Document memory management architecture
|
|
- [ ] Create tuning guide
|
|
- [ ] Add troubleshooting section
|
|
- [ ] Document monitoring setup
|
|
- [ ] Create migration guide
|
|
|
|
---
|
|
|
|
**Total Tasks**: 58
|
|
**Estimated Effort**: 3-4 weeks
|
|
**Critical Path**: Sections 1-2 must be completed first as they form the foundation |