- Create comprehensive proposal addressing OOM crashes and memory leaks - Define 6 core areas: model lifecycle, service pooling, monitoring - Add 58 implementation tasks across 8 sections - Design ModelManager with reference counting and idle timeout - Plan OCRServicePool for singleton service pattern - Specify MemoryGuard for proactive memory monitoring - Include concurrency controls and cleanup hooks - Add spec deltas for ocr-processing and task-management - Create detailed design document with architecture diagrams - Define performance targets: 75% memory reduction, 4x concurrency Critical improvements: - Remove PP-StructureV3 permanent exemption from unloading - Replace per-task OCRService instantiation with pooling - Add real GPU memory monitoring (currently always returns True) - Implement semaphore-based concurrency limits - Add proper resource cleanup on task completion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
4.4 KiB
4.4 KiB
Tasks for Enhanced Memory Management
Section 1: Model Lifecycle Management (Priority: Critical)
1.1 Create ModelManager class
- Design ModelManager interface with load/unload/get methods
- Implement reference counting for model instances
- Add idle timeout tracking with configurable thresholds
- Create teardown() method for explicit cleanup
- Add logging for model lifecycle events
1.2 Integrate PP-StructureV3 with ModelManager
- Remove permanent exemption from unloading (lines 255-267)
- Wrap PP-StructureV3 in managed model wrapper
- Implement lazy loading on first access
- Add unload capability with cache clearing
- Test model reload after unload
Section 2: Service Singleton Pattern (Priority: Critical)
2.1 Create OCRServicePool
- Design pool interface with acquire/release methods
- Implement per-device instance management
- Add queue-based task distribution
- Implement concurrency limits via semaphores
- Add health check for pooled instances
2.2 Refactor task router
- Replace OCRService() instantiation with pool.acquire()
- Add proper release in finally blocks
- Handle pool exhaustion gracefully
- Add metrics for pool utilization
- Update error handling for pooled services
Section 3: Enhanced Memory Monitoring (Priority: High)
3.1 Create MemoryGuard class
- Implement paddle.device.cuda memory queries
- Add pynvml integration as fallback
- Add torch memory query support
- Create configurable threshold system
- Implement memory prediction for operations
3.2 Integrate memory checks
- Replace existing check_gpu_memory implementation
- Add pre-operation memory checks
- Implement CPU fallback when memory low
- Add memory usage logging
- Create memory pressure alerts
Section 4: Concurrency Control (Priority: High)
4.1 Implement prediction semaphores
- Add semaphore for PP-StructureV3.predict
- Configure max concurrent predictions
- Add queue for waiting predictions
- Implement timeout handling
- Add metrics for queue depth
4.2 Add selective processing
- Create config for disabling chart/formula/table
- Implement batch processing for large documents
- Add progressive loading for multi-page docs
- Create priority queue for operations
- Test memory savings with selective processing
Section 5: Active Memory Management (Priority: Medium)
5.1 Create memory monitor thread
- Implement background monitoring loop
- Add periodic memory metrics collection
- Create threshold-based triggers
- Implement automatic cache clearing
- Add LRU-based model unloading
5.2 Add recovery mechanisms
- Implement emergency memory release
- Add worker process restart capability
- Create memory dump for debugging
- Add cooldown period after recovery
- Test recovery under various scenarios
Section 6: Cleanup Hooks (Priority: Medium)
6.1 Implement shutdown handlers
- Add FastAPI shutdown event handler
- Create signal handlers (SIGTERM, SIGINT)
- Implement graceful model unloading
- Add connection draining
- Test shutdown sequence
6.2 Add task cleanup
- Wrap background tasks with cleanup
- Add success/failure callbacks
- Implement resource release on completion
- Add cleanup verification logging
- Test cleanup in error scenarios
Section 7: Configuration & Settings (Priority: Low)
7.1 Add memory settings to config
- Define memory threshold parameters
- Add model timeout settings
- Configure pool sizes
- Add feature flags for new behavior
- Document all settings
7.2 Create monitoring dashboard
- Add memory metrics endpoint
- Create pool status endpoint
- Add model lifecycle stats
- Implement health check endpoint
- Add Prometheus metrics export
Section 8: Testing & Documentation (Priority: High)
8.1 Create comprehensive tests
- Unit tests for ModelManager
- Integration tests for OCRServicePool
- Memory leak detection tests
- Stress tests with concurrent requests
- Performance benchmarks
8.2 Documentation
- Document memory management architecture
- Create tuning guide
- Add troubleshooting section
- Document monitoring setup
- Create migration guide
Total Tasks: 58 Estimated Effort: 3-4 weeks Critical Path: Sections 1-2 must be completed first as they form the foundation