Files
OCR/openspec/changes/enhance-memory-management/tasks.md
egg ba8ddf2b68 feat: create OpenSpec proposal for enhanced memory management
- Create comprehensive proposal addressing OOM crashes and memory leaks
- Define 6 core areas: model lifecycle, service pooling, monitoring
- Add 58 implementation tasks across 8 sections
- Design ModelManager with reference counting and idle timeout
- Plan OCRServicePool for singleton service pattern
- Specify MemoryGuard for proactive memory monitoring
- Include concurrency controls and cleanup hooks
- Add spec deltas for ocr-processing and task-management
- Create detailed design document with architecture diagrams
- Define performance targets: 75% memory reduction, 4x concurrency

Critical improvements:
- Remove PP-StructureV3 permanent exemption from unloading
- Replace per-task OCRService instantiation with pooling
- Add real GPU memory monitoring (currently always returns True)
- Implement semaphore-based concurrency limits
- Add proper resource cleanup on task completion

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 15:21:32 +08:00

135 lines
4.4 KiB
Markdown

# Tasks for Enhanced Memory Management
## Section 1: Model Lifecycle Management (Priority: Critical)
### 1.1 Create ModelManager class
- [ ] Design ModelManager interface with load/unload/get methods
- [ ] Implement reference counting for model instances
- [ ] Add idle timeout tracking with configurable thresholds
- [ ] Create teardown() method for explicit cleanup
- [ ] Add logging for model lifecycle events
### 1.2 Integrate PP-StructureV3 with ModelManager
- [ ] Remove permanent exemption from unloading (lines 255-267)
- [ ] Wrap PP-StructureV3 in managed model wrapper
- [ ] Implement lazy loading on first access
- [ ] Add unload capability with cache clearing
- [ ] Test model reload after unload
## Section 2: Service Singleton Pattern (Priority: Critical)
### 2.1 Create OCRServicePool
- [ ] Design pool interface with acquire/release methods
- [ ] Implement per-device instance management
- [ ] Add queue-based task distribution
- [ ] Implement concurrency limits via semaphores
- [ ] Add health check for pooled instances
### 2.2 Refactor task router
- [ ] Replace OCRService() instantiation with pool.acquire()
- [ ] Add proper release in finally blocks
- [ ] Handle pool exhaustion gracefully
- [ ] Add metrics for pool utilization
- [ ] Update error handling for pooled services
## Section 3: Enhanced Memory Monitoring (Priority: High)
### 3.1 Create MemoryGuard class
- [ ] Implement paddle.device.cuda memory queries
- [ ] Add pynvml integration as fallback
- [ ] Add torch memory query support
- [ ] Create configurable threshold system
- [ ] Implement memory prediction for operations
### 3.2 Integrate memory checks
- [ ] Replace existing check_gpu_memory implementation
- [ ] Add pre-operation memory checks
- [ ] Implement CPU fallback when memory low
- [ ] Add memory usage logging
- [ ] Create memory pressure alerts
## Section 4: Concurrency Control (Priority: High)
### 4.1 Implement prediction semaphores
- [ ] Add semaphore for PP-StructureV3.predict
- [ ] Configure max concurrent predictions
- [ ] Add queue for waiting predictions
- [ ] Implement timeout handling
- [ ] Add metrics for queue depth
### 4.2 Add selective processing
- [ ] Create config for disabling chart/formula/table
- [ ] Implement batch processing for large documents
- [ ] Add progressive loading for multi-page docs
- [ ] Create priority queue for operations
- [ ] Test memory savings with selective processing
## Section 5: Active Memory Management (Priority: Medium)
### 5.1 Create memory monitor thread
- [ ] Implement background monitoring loop
- [ ] Add periodic memory metrics collection
- [ ] Create threshold-based triggers
- [ ] Implement automatic cache clearing
- [ ] Add LRU-based model unloading
### 5.2 Add recovery mechanisms
- [ ] Implement emergency memory release
- [ ] Add worker process restart capability
- [ ] Create memory dump for debugging
- [ ] Add cooldown period after recovery
- [ ] Test recovery under various scenarios
## Section 6: Cleanup Hooks (Priority: Medium)
### 6.1 Implement shutdown handlers
- [ ] Add FastAPI shutdown event handler
- [ ] Create signal handlers (SIGTERM, SIGINT)
- [ ] Implement graceful model unloading
- [ ] Add connection draining
- [ ] Test shutdown sequence
### 6.2 Add task cleanup
- [ ] Wrap background tasks with cleanup
- [ ] Add success/failure callbacks
- [ ] Implement resource release on completion
- [ ] Add cleanup verification logging
- [ ] Test cleanup in error scenarios
## Section 7: Configuration & Settings (Priority: Low)
### 7.1 Add memory settings to config
- [ ] Define memory threshold parameters
- [ ] Add model timeout settings
- [ ] Configure pool sizes
- [ ] Add feature flags for new behavior
- [ ] Document all settings
### 7.2 Create monitoring dashboard
- [ ] Add memory metrics endpoint
- [ ] Create pool status endpoint
- [ ] Add model lifecycle stats
- [ ] Implement health check endpoint
- [ ] Add Prometheus metrics export
## Section 8: Testing & Documentation (Priority: High)
### 8.1 Create comprehensive tests
- [ ] Unit tests for ModelManager
- [ ] Integration tests for OCRServicePool
- [ ] Memory leak detection tests
- [ ] Stress tests with concurrent requests
- [ ] Performance benchmarks
### 8.2 Documentation
- [ ] Document memory management architecture
- [ ] Create tuning guide
- [ ] Add troubleshooting section
- [ ] Document monitoring setup
- [ ] Create migration guide
---
**Total Tasks**: 58
**Estimated Effort**: 3-4 weeks
**Critical Path**: Sections 1-2 must be completed first as they form the foundation