- Create comprehensive proposal addressing OOM crashes and memory leaks - Define 6 core areas: model lifecycle, service pooling, monitoring - Add 58 implementation tasks across 8 sections - Design ModelManager with reference counting and idle timeout - Plan OCRServicePool for singleton service pattern - Specify MemoryGuard for proactive memory monitoring - Include concurrency controls and cleanup hooks - Add spec deltas for ocr-processing and task-management - Create detailed design document with architecture diagrams - Define performance targets: 75% memory reduction, 4x concurrency Critical improvements: - Remove PP-StructureV3 permanent exemption from unloading - Replace per-task OCRService instantiation with pooling - Add real GPU memory monitoring (currently always returns True) - Implement semaphore-based concurrency limits - Add proper resource cleanup on task completion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
146 lines
3.8 KiB
Markdown
146 lines
3.8 KiB
Markdown
# Spec Delta: ocr-processing
|
|
|
|
## Changes to OCR Processing Specification
|
|
|
|
### 1. Model Lifecycle Management
|
|
|
|
#### Added: ModelManager Class
|
|
```python
|
|
class ModelManager:
|
|
"""Manages model lifecycle with reference counting and idle timeout"""
|
|
|
|
def load_model(self, model_id: str, config: Dict) -> Model
|
|
"""Load a model or return existing instance with ref count++"""
|
|
|
|
def unload_model(self, model_id: str) -> None
|
|
"""Decrement ref count and unload if zero"""
|
|
|
|
def get_model(self, model_id: str) -> Optional[Model]
|
|
"""Get model instance if loaded"""
|
|
|
|
def teardown(self) -> None
|
|
"""Force unload all models immediately"""
|
|
```
|
|
|
|
#### Modified: PPStructureV3 Integration
|
|
- Remove permanent exemption from unloading (lines 255-267)
|
|
- Wrap PP-StructureV3 in ModelManager
|
|
- Support lazy loading on first access
|
|
- Add unload capability with cache clearing
|
|
|
|
### 2. Service Architecture
|
|
|
|
#### Added: OCRServicePool
|
|
```python
|
|
class OCRServicePool:
|
|
"""Pool of OCRService instances (one per device)"""
|
|
|
|
def acquire(self, device: str = "GPU:0") -> OCRService
|
|
"""Get service from pool with semaphore control"""
|
|
|
|
def release(self, service: OCRService) -> None
|
|
"""Return service to pool"""
|
|
```
|
|
|
|
#### Modified: OCRService Instantiation
|
|
- Replace direct instantiation with pool.acquire()
|
|
- Add finally blocks for pool.release()
|
|
- Handle pool exhaustion gracefully
|
|
|
|
### 3. Memory Management
|
|
|
|
#### Added: MemoryGuard Class
|
|
```python
|
|
class MemoryGuard:
|
|
"""Monitor and control memory usage"""
|
|
|
|
def check_memory(self, required_mb: int = 0) -> bool
|
|
"""Check if sufficient memory available"""
|
|
|
|
def get_memory_stats(self) -> Dict
|
|
"""Get current memory usage statistics"""
|
|
|
|
def predict_memory(self, operation: str, params: Dict) -> int
|
|
"""Predict memory requirement for operation"""
|
|
```
|
|
|
|
#### Modified: Processing Flow
|
|
- Add memory checks before operations
|
|
- Implement CPU fallback when GPU memory low
|
|
- Add progressive loading for multi-page documents
|
|
|
|
### 4. Concurrency Control
|
|
|
|
#### Added: Prediction Semaphores
|
|
```python
|
|
# Maximum concurrent PP-StructureV3 predictions
|
|
MAX_CONCURRENT_PREDICTIONS = 2
|
|
|
|
prediction_semaphore = asyncio.Semaphore(MAX_CONCURRENT_PREDICTIONS)
|
|
|
|
async def predict_with_limit(self, image, custom_params=None):
|
|
async with prediction_semaphore:
|
|
return await self._predict(image, custom_params)
|
|
```
|
|
|
|
#### Added: Selective Processing
|
|
```python
|
|
class ProcessingConfig:
|
|
enable_charts: bool = True
|
|
enable_formulas: bool = True
|
|
enable_tables: bool = True
|
|
batch_size: int = 10 # Pages per batch
|
|
```
|
|
|
|
### 5. Resource Cleanup
|
|
|
|
#### Added: Cleanup Hooks
|
|
```python
|
|
@app.on_event("shutdown")
|
|
async def shutdown_handler():
|
|
"""Graceful shutdown with model unloading"""
|
|
await model_manager.teardown()
|
|
await service_pool.shutdown()
|
|
```
|
|
|
|
#### Modified: Task Completion
|
|
```python
|
|
async def process_task(task_id: str):
|
|
service = None
|
|
try:
|
|
service = await pool.acquire()
|
|
# ... processing ...
|
|
finally:
|
|
if service:
|
|
await pool.release(service)
|
|
await cleanup_task_resources(task_id)
|
|
```
|
|
|
|
## Configuration Changes
|
|
|
|
### Added Settings
|
|
```yaml
|
|
memory:
|
|
gpu_threshold_warning: 0.8 # 80% usage
|
|
gpu_threshold_critical: 0.95 # 95% usage
|
|
model_idle_timeout: 300 # 5 minutes
|
|
enable_memory_monitor: true
|
|
monitor_interval: 10 # seconds
|
|
|
|
pool:
|
|
max_services_per_device: 2
|
|
queue_timeout: 60 # seconds
|
|
|
|
concurrency:
|
|
max_predictions: 2
|
|
max_batch_size: 10
|
|
```
|
|
|
|
## Breaking Changes
|
|
None - All changes are backward compatible optimizations.
|
|
|
|
## Migration Path
|
|
1. Deploy new code with default settings (no config changes needed)
|
|
2. Monitor memory metrics via new endpoints
|
|
3. Tune parameters based on workload
|
|
4. Enable selective processing if needed |