- Create comprehensive proposal addressing OOM crashes and memory leaks - Define 6 core areas: model lifecycle, service pooling, monitoring - Add 58 implementation tasks across 8 sections - Design ModelManager with reference counting and idle timeout - Plan OCRServicePool for singleton service pattern - Specify MemoryGuard for proactive memory monitoring - Include concurrency controls and cleanup hooks - Add spec deltas for ocr-processing and task-management - Create detailed design document with architecture diagrams - Define performance targets: 75% memory reduction, 4x concurrency Critical improvements: - Remove PP-StructureV3 permanent exemption from unloading - Replace per-task OCRService instantiation with pooling - Add real GPU memory monitoring (currently always returns True) - Implement semaphore-based concurrency limits - Add proper resource cleanup on task completion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
3.8 KiB
3.8 KiB
Spec Delta: ocr-processing
Changes to OCR Processing Specification
1. Model Lifecycle Management
Added: ModelManager Class
class ModelManager:
"""Manages model lifecycle with reference counting and idle timeout"""
def load_model(self, model_id: str, config: Dict) -> Model
"""Load a model or return existing instance with ref count++"""
def unload_model(self, model_id: str) -> None
"""Decrement ref count and unload if zero"""
def get_model(self, model_id: str) -> Optional[Model]
"""Get model instance if loaded"""
def teardown(self) -> None
"""Force unload all models immediately"""
Modified: PPStructureV3 Integration
- Remove permanent exemption from unloading (lines 255-267)
- Wrap PP-StructureV3 in ModelManager
- Support lazy loading on first access
- Add unload capability with cache clearing
2. Service Architecture
Added: OCRServicePool
class OCRServicePool:
"""Pool of OCRService instances (one per device)"""
def acquire(self, device: str = "GPU:0") -> OCRService
"""Get service from pool with semaphore control"""
def release(self, service: OCRService) -> None
"""Return service to pool"""
Modified: OCRService Instantiation
- Replace direct instantiation with pool.acquire()
- Add finally blocks for pool.release()
- Handle pool exhaustion gracefully
3. Memory Management
Added: MemoryGuard Class
class MemoryGuard:
"""Monitor and control memory usage"""
def check_memory(self, required_mb: int = 0) -> bool
"""Check if sufficient memory available"""
def get_memory_stats(self) -> Dict
"""Get current memory usage statistics"""
def predict_memory(self, operation: str, params: Dict) -> int
"""Predict memory requirement for operation"""
Modified: Processing Flow
- Add memory checks before operations
- Implement CPU fallback when GPU memory low
- Add progressive loading for multi-page documents
4. Concurrency Control
Added: Prediction Semaphores
# Maximum concurrent PP-StructureV3 predictions
MAX_CONCURRENT_PREDICTIONS = 2
prediction_semaphore = asyncio.Semaphore(MAX_CONCURRENT_PREDICTIONS)
async def predict_with_limit(self, image, custom_params=None):
async with prediction_semaphore:
return await self._predict(image, custom_params)
Added: Selective Processing
class ProcessingConfig:
enable_charts: bool = True
enable_formulas: bool = True
enable_tables: bool = True
batch_size: int = 10 # Pages per batch
5. Resource Cleanup
Added: Cleanup Hooks
@app.on_event("shutdown")
async def shutdown_handler():
"""Graceful shutdown with model unloading"""
await model_manager.teardown()
await service_pool.shutdown()
Modified: Task Completion
async def process_task(task_id: str):
service = None
try:
service = await pool.acquire()
# ... processing ...
finally:
if service:
await pool.release(service)
await cleanup_task_resources(task_id)
Configuration Changes
Added Settings
memory:
gpu_threshold_warning: 0.8 # 80% usage
gpu_threshold_critical: 0.95 # 95% usage
model_idle_timeout: 300 # 5 minutes
enable_memory_monitor: true
monitor_interval: 10 # seconds
pool:
max_services_per_device: 2
queue_timeout: 60 # seconds
concurrency:
max_predictions: 2
max_batch_size: 10
Breaking Changes
None - All changes are backward compatible optimizations.
Migration Path
- Deploy new code with default settings (no config changes needed)
- Monitor memory metrics via new endpoints
- Tune parameters based on workload
- Enable selective processing if needed