Files
OCR/openspec/changes/enhance-memory-management/tasks.md
egg ba8ddf2b68 feat: create OpenSpec proposal for enhanced memory management
- Create comprehensive proposal addressing OOM crashes and memory leaks
- Define 6 core areas: model lifecycle, service pooling, monitoring
- Add 58 implementation tasks across 8 sections
- Design ModelManager with reference counting and idle timeout
- Plan OCRServicePool for singleton service pattern
- Specify MemoryGuard for proactive memory monitoring
- Include concurrency controls and cleanup hooks
- Add spec deltas for ocr-processing and task-management
- Create detailed design document with architecture diagrams
- Define performance targets: 75% memory reduction, 4x concurrency

Critical improvements:
- Remove PP-StructureV3 permanent exemption from unloading
- Replace per-task OCRService instantiation with pooling
- Add real GPU memory monitoring (currently always returns True)
- Implement semaphore-based concurrency limits
- Add proper resource cleanup on task completion

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 15:21:32 +08:00

4.4 KiB

Tasks for Enhanced Memory Management

Section 1: Model Lifecycle Management (Priority: Critical)

1.1 Create ModelManager class

  • Design ModelManager interface with load/unload/get methods
  • Implement reference counting for model instances
  • Add idle timeout tracking with configurable thresholds
  • Create teardown() method for explicit cleanup
  • Add logging for model lifecycle events

1.2 Integrate PP-StructureV3 with ModelManager

  • Remove permanent exemption from unloading (lines 255-267)
  • Wrap PP-StructureV3 in managed model wrapper
  • Implement lazy loading on first access
  • Add unload capability with cache clearing
  • Test model reload after unload

Section 2: Service Singleton Pattern (Priority: Critical)

2.1 Create OCRServicePool

  • Design pool interface with acquire/release methods
  • Implement per-device instance management
  • Add queue-based task distribution
  • Implement concurrency limits via semaphores
  • Add health check for pooled instances

2.2 Refactor task router

  • Replace OCRService() instantiation with pool.acquire()
  • Add proper release in finally blocks
  • Handle pool exhaustion gracefully
  • Add metrics for pool utilization
  • Update error handling for pooled services

Section 3: Enhanced Memory Monitoring (Priority: High)

3.1 Create MemoryGuard class

  • Implement paddle.device.cuda memory queries
  • Add pynvml integration as fallback
  • Add torch memory query support
  • Create configurable threshold system
  • Implement memory prediction for operations

3.2 Integrate memory checks

  • Replace existing check_gpu_memory implementation
  • Add pre-operation memory checks
  • Implement CPU fallback when memory low
  • Add memory usage logging
  • Create memory pressure alerts

Section 4: Concurrency Control (Priority: High)

4.1 Implement prediction semaphores

  • Add semaphore for PP-StructureV3.predict
  • Configure max concurrent predictions
  • Add queue for waiting predictions
  • Implement timeout handling
  • Add metrics for queue depth

4.2 Add selective processing

  • Create config for disabling chart/formula/table
  • Implement batch processing for large documents
  • Add progressive loading for multi-page docs
  • Create priority queue for operations
  • Test memory savings with selective processing

Section 5: Active Memory Management (Priority: Medium)

5.1 Create memory monitor thread

  • Implement background monitoring loop
  • Add periodic memory metrics collection
  • Create threshold-based triggers
  • Implement automatic cache clearing
  • Add LRU-based model unloading

5.2 Add recovery mechanisms

  • Implement emergency memory release
  • Add worker process restart capability
  • Create memory dump for debugging
  • Add cooldown period after recovery
  • Test recovery under various scenarios

Section 6: Cleanup Hooks (Priority: Medium)

6.1 Implement shutdown handlers

  • Add FastAPI shutdown event handler
  • Create signal handlers (SIGTERM, SIGINT)
  • Implement graceful model unloading
  • Add connection draining
  • Test shutdown sequence

6.2 Add task cleanup

  • Wrap background tasks with cleanup
  • Add success/failure callbacks
  • Implement resource release on completion
  • Add cleanup verification logging
  • Test cleanup in error scenarios

Section 7: Configuration & Settings (Priority: Low)

7.1 Add memory settings to config

  • Define memory threshold parameters
  • Add model timeout settings
  • Configure pool sizes
  • Add feature flags for new behavior
  • Document all settings

7.2 Create monitoring dashboard

  • Add memory metrics endpoint
  • Create pool status endpoint
  • Add model lifecycle stats
  • Implement health check endpoint
  • Add Prometheus metrics export

Section 8: Testing & Documentation (Priority: High)

8.1 Create comprehensive tests

  • Unit tests for ModelManager
  • Integration tests for OCRServicePool
  • Memory leak detection tests
  • Stress tests with concurrent requests
  • Performance benchmarks

8.2 Documentation

  • Document memory management architecture
  • Create tuning guide
  • Add troubleshooting section
  • Document monitoring setup
  • Create migration guide

Total Tasks: 58 Estimated Effort: 3-4 weeks Critical Path: Sections 1-2 must be completed first as they form the foundation