egg/OCR

Files

egg a227311b2d chore: archive enhance-memory-management proposal (75/80 tasks)

Archive incomplete proposal for later continuation.
OCR processing has known quality issues to be addressed in future work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-26 16:10:45 +08:00

3.7 KiB

Raw Blame History

Change: Enhanced Memory Management for OCR Services

Why

The current OCR service architecture suffers from critical memory management issues that lead to GPU memory exhaustion, service instability, and degraded performance under load:

Memory Leaks: PP-StructureV3 models are permanently exempted from unloading (lines 255-267), causing VRAM to remain occupied indefinitely.
Instance Proliferation: Each task creates a new OCRService instance (tasks.py lines 44-65), leading to duplicate model loading and memory fragmentation.
Inadequate Memory Monitoring: check_gpu_memory() always returns True in Paddle-only environments, providing no actual memory protection.
Uncontrolled Concurrency: No limits on simultaneous PP-StructureV3 predictions, causing memory spikes.
No Resource Cleanup: Tasks complete without releasing GPU memory, leading to accumulated memory usage.

These issues cause service crashes, require frequent restarts, and prevent scaling to handle multiple concurrent requests.

What Changes

1. Model Lifecycle Management

NEW: ModelManager class to handle model loading/unloading with reference counting
NEW: Idle timeout mechanism for PP-StructureV3 (same as language models)
NEW: Explicit teardown() method for end-of-flow cleanup
MODIFIED: OCRService to use managed model instances

2. Service Singleton Pattern

NEW: OCRServicePool to manage OCRService instances (one per GPU/device)
NEW: Queue-based task distribution with concurrency limits
MODIFIED: Task router to use pooled services instead of creating new instances

3. Enhanced Memory Monitoring

NEW: MemoryGuard class using paddle.device.cuda memory APIs
NEW: Support for pynvml/torch as fallback memory query methods
NEW: Memory threshold configuration (warning/critical levels)
MODIFIED: Processing logic to degrade gracefully when memory is low

4. Concurrency Control

NEW: Semaphore-based limits for PP-StructureV3 predictions
NEW: Configuration to disable/delay chart/formula/table analysis
NEW: Batch processing mode for large documents

5. Active Memory Management

NEW: Background memory monitor thread with metrics collection
NEW: Automatic cache clearing when thresholds exceeded
NEW: Model unloading based on LRU policy
NEW: Worker process restart capability when memory cannot be recovered

6. Cleanup Hooks

NEW: Global shutdown handlers for graceful cleanup
NEW: Task completion callbacks to release resources
MODIFIED: Background task wrapper to ensure cleanup on success/failure

Impact

Affected specs:

ocr-processing - Model management and processing flow
task-management - Task execution and resource management

Affected code:

backend/app/services/ocr_service.py - Major refactoring for memory management
backend/app/routers/tasks.py - Use service pool instead of new instances
backend/app/core/config.py - New memory management settings
backend/app/services/memory_manager.py - NEW file
backend/app/services/service_pool.py - NEW file

Breaking changes: None - All changes are internal optimizations

Migration: Existing deployments will benefit immediately with no configuration changes required. Optional tuning parameters available for optimization.

Testing Requirements

Memory leak tests - Verify models are properly unloaded
Concurrency tests - Validate semaphore limits work correctly
Stress tests - Ensure system degrades gracefully under memory pressure
Integration tests - Verify pooled services work correctly
Performance benchmarks - Measure memory usage improvements

3.7 KiB Raw Blame History