Files
OCR/openspec/changes/archive/2025-11-26-enhance-memory-management/tasks.md
egg a227311b2d chore: archive enhance-memory-management proposal (75/80 tasks)
Archive incomplete proposal for later continuation.
OCR processing has known quality issues to be addressed in future work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 16:10:45 +08:00

7.2 KiB

Tasks for Enhanced Memory Management

Section 1: Model Lifecycle Management (Priority: Critical)

1.1 Create ModelManager class

  • Design ModelManager interface with load/unload/get methods
  • Implement reference counting for model instances
  • Add idle timeout tracking with configurable thresholds
  • Create teardown() method for explicit cleanup
  • Add logging for model lifecycle events

1.2 Integrate PP-StructureV3 with ModelManager

  • Remove permanent exemption from unloading (lines 255-267)
  • Wrap PP-StructureV3 in managed model wrapper
  • Implement lazy loading on first access
  • Add unload capability with cache clearing
  • Test model reload after unload

Section 2: Service Singleton Pattern (Priority: Critical)

2.1 Create OCRServicePool

  • Design pool interface with acquire/release methods
  • Implement per-device instance management
  • Add queue-based task distribution
  • Implement concurrency limits via semaphores
  • Add health check for pooled instances

2.2 Refactor task router

  • Replace OCRService() instantiation with pool.acquire()
  • Add proper release in finally blocks
  • Handle pool exhaustion gracefully
  • Add metrics for pool utilization
  • Update error handling for pooled services

Section 3: Enhanced Memory Monitoring (Priority: High)

3.1 Create MemoryGuard class

  • Implement paddle.device.cuda memory queries
  • Add pynvml integration as fallback
  • Add torch memory query support
  • Create configurable threshold system
  • Implement memory prediction for operations

3.2 Integrate memory checks

  • Replace existing check_gpu_memory implementation
  • Add pre-operation memory checks
  • Implement CPU fallback when memory low
  • Add memory usage logging
  • Create memory pressure alerts

Section 4: Concurrency Control (Priority: High)

4.1 Implement prediction semaphores

  • Add semaphore for PP-StructureV3.predict
  • Configure max concurrent predictions
  • Add queue for waiting predictions
  • Implement timeout handling
  • Add metrics for queue depth

4.2 Add selective processing

  • Create config for disabling chart/formula/table
  • Implement batch processing for large documents
  • Add progressive loading for multi-page docs
  • Create priority queue for operations
  • Test memory savings with selective processing

Section 5: Active Memory Management (Priority: Medium)

5.1 Create memory monitor thread

  • Implement background monitoring loop
  • Add periodic memory metrics collection
  • Create threshold-based triggers
  • Implement automatic cache clearing
  • Add LRU-based model unloading

5.2 Add recovery mechanisms

  • Implement emergency memory release
  • Add worker process restart capability (RecoveryManager)
  • Create memory dump for debugging
  • Add cooldown period after recovery
  • Test recovery under various scenarios

Section 6: Cleanup Hooks (Priority: Medium)

6.1 Implement shutdown handlers

  • Add FastAPI shutdown event handler
  • Create signal handlers (SIGTERM, SIGINT)
  • Implement graceful model unloading
  • Add connection draining
  • Test shutdown sequence

6.2 Add task cleanup

  • Wrap background tasks with cleanup
  • Add success/failure callbacks
  • Implement resource release on completion
  • Add cleanup verification logging
  • Test cleanup in error scenarios

Section 7: Configuration & Settings (Priority: Low)

7.1 Add memory settings to config

  • Define memory threshold parameters
  • Add model timeout settings
  • Configure pool sizes
  • Add feature flags for new behavior
  • Document all settings

7.2 Create monitoring dashboard

  • Add memory metrics endpoint
  • Create pool status endpoint
  • Add model lifecycle stats
  • Implement health check endpoint
  • Add Prometheus metrics export

Section 8: Testing & Documentation (Priority: High)

8.1 Create comprehensive tests

  • Unit tests for ModelManager
  • Integration tests for OCRServicePool
  • Memory leak detection tests
  • Stress tests with concurrent requests
  • Performance benchmarks

8.2 Documentation

  • Document memory management architecture
  • Create tuning guide
  • Add troubleshooting section
  • Document monitoring setup
  • Create migration guide

Total Tasks: 58 Completed: 53 Remaining: 5 (Section 8.2 Documentation only) Progress: ~91%

Critical Path Status: Sections 1-8.1 are completed (foundation, memory monitoring, prediction semaphores, batch processing, recovery, signal handlers, configuration, Prometheus metrics, and comprehensive tests in place)

Implementation Summary

Files Created

  • backend/app/services/memory_manager.py - ModelManager, MemoryGuard, MemoryConfig, PredictionSemaphore, BatchProcessor, ProgressiveLoader, PriorityOperationQueue, RecoveryManager
  • backend/app/services/service_pool.py - OCRServicePool, PoolConfig
  • backend/tests/services/test_memory_manager.py - Unit tests for memory management (57 tests)
  • backend/tests/services/test_service_pool.py - Unit tests for service pool (18 tests)
  • backend/tests/services/test_ocr_memory_integration.py - Integration tests for memory check patterns (10 tests)

Files Modified

  • backend/app/core/config.py - Added memory management configuration settings
  • backend/app/services/ocr_service.py - Removed PP-StructureV3 exemption, added unload capability, integrated MemoryGuard for pre-operation checks and CPU fallback, added PredictionSemaphore for concurrent prediction control
  • backend/app/services/pp_structure_enhanced.py - Added PredictionSemaphore control for predict calls
  • backend/app/routers/tasks.py - Refactored to use service pool
  • backend/app/main.py - Added startup/shutdown handlers, signal handlers (SIGTERM/SIGINT), connection draining, recovery manager shutdown

New Classes Added (Section 4.2-8)

  • BatchProcessor - Memory-aware batch processing for large documents with priority support
  • ProgressiveLoader - Progressive page loading with lookahead and automatic cleanup
  • PriorityOperationQueue - Priority queue with timeout and cancellation support
  • RecoveryManager - Memory recovery with cooldown period and attempt limits
  • MemoryDumper - Memory dump creation for debugging with history and comparison
  • PrometheusMetrics - Prometheus-format metrics export for monitoring
  • Signal handlers for graceful shutdown (SIGTERM, SIGINT)
  • Connection draining for clean shutdown

New Test Classes Added (Section 8.1)

  • TestModelReloadAfterUnload - Tests for model reload after unload
  • TestSelectiveProcessingMemorySavings - Tests for memory savings with selective processing
  • TestRecoveryScenarios - Tests for recovery under various scenarios
  • TestShutdownSequence - Tests for shutdown sequence
  • TestCleanupInErrorScenarios - Tests for cleanup in error scenarios
  • TestMemoryLeakDetection - Tests for memory leak detection
  • TestStressConcurrentRequests - Stress tests with concurrent requests
  • TestPerformanceBenchmarks - Performance benchmark tests
  • TestMemoryDumper - Tests for MemoryDumper class
  • TestPrometheusMetrics - Tests for PrometheusMetrics class