Backend: - Add hybrid image extraction for Direct track (inline image blocks) - Add render_inline_image_regions() fallback when OCR doesn't find images - Add check_document_for_missing_images() for detecting missing images - Add memory management system (MemoryGuard, ModelManager, ServicePool) - Update pdf_generator_service to handle HYBRID processing track - Add ElementType.LOGO for logo extraction Frontend: - Fix PDF viewer re-rendering issues with memoization - Add TaskNotFound component and useTaskValidation hook - Disable StrictMode due to react-pdf incompatibility - Fix task detail and results page loading states 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
7.2 KiB
7.2 KiB
Tasks for Enhanced Memory Management
Section 1: Model Lifecycle Management (Priority: Critical)
1.1 Create ModelManager class
- Design ModelManager interface with load/unload/get methods
- Implement reference counting for model instances
- Add idle timeout tracking with configurable thresholds
- Create teardown() method for explicit cleanup
- Add logging for model lifecycle events
1.2 Integrate PP-StructureV3 with ModelManager
- Remove permanent exemption from unloading (lines 255-267)
- Wrap PP-StructureV3 in managed model wrapper
- Implement lazy loading on first access
- Add unload capability with cache clearing
- Test model reload after unload
Section 2: Service Singleton Pattern (Priority: Critical)
2.1 Create OCRServicePool
- Design pool interface with acquire/release methods
- Implement per-device instance management
- Add queue-based task distribution
- Implement concurrency limits via semaphores
- Add health check for pooled instances
2.2 Refactor task router
- Replace OCRService() instantiation with pool.acquire()
- Add proper release in finally blocks
- Handle pool exhaustion gracefully
- Add metrics for pool utilization
- Update error handling for pooled services
Section 3: Enhanced Memory Monitoring (Priority: High)
3.1 Create MemoryGuard class
- Implement paddle.device.cuda memory queries
- Add pynvml integration as fallback
- Add torch memory query support
- Create configurable threshold system
- Implement memory prediction for operations
3.2 Integrate memory checks
- Replace existing check_gpu_memory implementation
- Add pre-operation memory checks
- Implement CPU fallback when memory low
- Add memory usage logging
- Create memory pressure alerts
Section 4: Concurrency Control (Priority: High)
4.1 Implement prediction semaphores
- Add semaphore for PP-StructureV3.predict
- Configure max concurrent predictions
- Add queue for waiting predictions
- Implement timeout handling
- Add metrics for queue depth
4.2 Add selective processing
- Create config for disabling chart/formula/table
- Implement batch processing for large documents
- Add progressive loading for multi-page docs
- Create priority queue for operations
- Test memory savings with selective processing
Section 5: Active Memory Management (Priority: Medium)
5.1 Create memory monitor thread
- Implement background monitoring loop
- Add periodic memory metrics collection
- Create threshold-based triggers
- Implement automatic cache clearing
- Add LRU-based model unloading
5.2 Add recovery mechanisms
- Implement emergency memory release
- Add worker process restart capability (RecoveryManager)
- Create memory dump for debugging
- Add cooldown period after recovery
- Test recovery under various scenarios
Section 6: Cleanup Hooks (Priority: Medium)
6.1 Implement shutdown handlers
- Add FastAPI shutdown event handler
- Create signal handlers (SIGTERM, SIGINT)
- Implement graceful model unloading
- Add connection draining
- Test shutdown sequence
6.2 Add task cleanup
- Wrap background tasks with cleanup
- Add success/failure callbacks
- Implement resource release on completion
- Add cleanup verification logging
- Test cleanup in error scenarios
Section 7: Configuration & Settings (Priority: Low)
7.1 Add memory settings to config
- Define memory threshold parameters
- Add model timeout settings
- Configure pool sizes
- Add feature flags for new behavior
- Document all settings
7.2 Create monitoring dashboard
- Add memory metrics endpoint
- Create pool status endpoint
- Add model lifecycle stats
- Implement health check endpoint
- Add Prometheus metrics export
Section 8: Testing & Documentation (Priority: High)
8.1 Create comprehensive tests
- Unit tests for ModelManager
- Integration tests for OCRServicePool
- Memory leak detection tests
- Stress tests with concurrent requests
- Performance benchmarks
8.2 Documentation
- Document memory management architecture
- Create tuning guide
- Add troubleshooting section
- Document monitoring setup
- Create migration guide
Total Tasks: 58 Completed: 53 Remaining: 5 (Section 8.2 Documentation only) Progress: ~91%
Critical Path Status: Sections 1-8.1 are completed (foundation, memory monitoring, prediction semaphores, batch processing, recovery, signal handlers, configuration, Prometheus metrics, and comprehensive tests in place)
Implementation Summary
Files Created
backend/app/services/memory_manager.py- ModelManager, MemoryGuard, MemoryConfig, PredictionSemaphore, BatchProcessor, ProgressiveLoader, PriorityOperationQueue, RecoveryManagerbackend/app/services/service_pool.py- OCRServicePool, PoolConfigbackend/tests/services/test_memory_manager.py- Unit tests for memory management (57 tests)backend/tests/services/test_service_pool.py- Unit tests for service pool (18 tests)backend/tests/services/test_ocr_memory_integration.py- Integration tests for memory check patterns (10 tests)
Files Modified
backend/app/core/config.py- Added memory management configuration settingsbackend/app/services/ocr_service.py- Removed PP-StructureV3 exemption, added unload capability, integrated MemoryGuard for pre-operation checks and CPU fallback, added PredictionSemaphore for concurrent prediction controlbackend/app/services/pp_structure_enhanced.py- Added PredictionSemaphore control for predict callsbackend/app/routers/tasks.py- Refactored to use service poolbackend/app/main.py- Added startup/shutdown handlers, signal handlers (SIGTERM/SIGINT), connection draining, recovery manager shutdown
New Classes Added (Section 4.2-8)
BatchProcessor- Memory-aware batch processing for large documents with priority supportProgressiveLoader- Progressive page loading with lookahead and automatic cleanupPriorityOperationQueue- Priority queue with timeout and cancellation supportRecoveryManager- Memory recovery with cooldown period and attempt limitsMemoryDumper- Memory dump creation for debugging with history and comparisonPrometheusMetrics- Prometheus-format metrics export for monitoring- Signal handlers for graceful shutdown (SIGTERM, SIGINT)
- Connection draining for clean shutdown
New Test Classes Added (Section 8.1)
TestModelReloadAfterUnload- Tests for model reload after unloadTestSelectiveProcessingMemorySavings- Tests for memory savings with selective processingTestRecoveryScenarios- Tests for recovery under various scenariosTestShutdownSequence- Tests for shutdown sequenceTestCleanupInErrorScenarios- Tests for cleanup in error scenariosTestMemoryLeakDetection- Tests for memory leak detectionTestStressConcurrentRequests- Stress tests with concurrent requestsTestPerformanceBenchmarks- Performance benchmark testsTestMemoryDumper- Tests for MemoryDumper classTestPrometheusMetrics- Tests for PrometheusMetrics class