feat: create OpenSpec proposal for enhanced memory management
- Create comprehensive proposal addressing OOM crashes and memory leaks - Define 6 core areas: model lifecycle, service pooling, monitoring - Add 58 implementation tasks across 8 sections - Design ModelManager with reference counting and idle timeout - Plan OCRServicePool for singleton service pattern - Specify MemoryGuard for proactive memory monitoring - Include concurrency controls and cleanup hooks - Add spec deltas for ocr-processing and task-management - Create detailed design document with architecture diagrams - Define performance targets: 75% memory reduction, 4x concurrency Critical improvements: - Remove PP-StructureV3 permanent exemption from unloading - Replace per-task OCRService instantiation with pooling - Add real GPU memory monitoring (currently always returns True) - Implement semaphore-based concurrency limits - Add proper resource cleanup on task completion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,146 @@
|
||||
# Spec Delta: ocr-processing
|
||||
|
||||
## Changes to OCR Processing Specification
|
||||
|
||||
### 1. Model Lifecycle Management
|
||||
|
||||
#### Added: ModelManager Class
|
||||
```python
|
||||
class ModelManager:
|
||||
"""Manages model lifecycle with reference counting and idle timeout"""
|
||||
|
||||
def load_model(self, model_id: str, config: Dict) -> Model
|
||||
"""Load a model or return existing instance with ref count++"""
|
||||
|
||||
def unload_model(self, model_id: str) -> None
|
||||
"""Decrement ref count and unload if zero"""
|
||||
|
||||
def get_model(self, model_id: str) -> Optional[Model]
|
||||
"""Get model instance if loaded"""
|
||||
|
||||
def teardown(self) -> None
|
||||
"""Force unload all models immediately"""
|
||||
```
|
||||
|
||||
#### Modified: PPStructureV3 Integration
|
||||
- Remove permanent exemption from unloading (lines 255-267)
|
||||
- Wrap PP-StructureV3 in ModelManager
|
||||
- Support lazy loading on first access
|
||||
- Add unload capability with cache clearing
|
||||
|
||||
### 2. Service Architecture
|
||||
|
||||
#### Added: OCRServicePool
|
||||
```python
|
||||
class OCRServicePool:
|
||||
"""Pool of OCRService instances (one per device)"""
|
||||
|
||||
def acquire(self, device: str = "GPU:0") -> OCRService
|
||||
"""Get service from pool with semaphore control"""
|
||||
|
||||
def release(self, service: OCRService) -> None
|
||||
"""Return service to pool"""
|
||||
```
|
||||
|
||||
#### Modified: OCRService Instantiation
|
||||
- Replace direct instantiation with pool.acquire()
|
||||
- Add finally blocks for pool.release()
|
||||
- Handle pool exhaustion gracefully
|
||||
|
||||
### 3. Memory Management
|
||||
|
||||
#### Added: MemoryGuard Class
|
||||
```python
|
||||
class MemoryGuard:
|
||||
"""Monitor and control memory usage"""
|
||||
|
||||
def check_memory(self, required_mb: int = 0) -> bool
|
||||
"""Check if sufficient memory available"""
|
||||
|
||||
def get_memory_stats(self) -> Dict
|
||||
"""Get current memory usage statistics"""
|
||||
|
||||
def predict_memory(self, operation: str, params: Dict) -> int
|
||||
"""Predict memory requirement for operation"""
|
||||
```
|
||||
|
||||
#### Modified: Processing Flow
|
||||
- Add memory checks before operations
|
||||
- Implement CPU fallback when GPU memory low
|
||||
- Add progressive loading for multi-page documents
|
||||
|
||||
### 4. Concurrency Control
|
||||
|
||||
#### Added: Prediction Semaphores
|
||||
```python
|
||||
# Maximum concurrent PP-StructureV3 predictions
|
||||
MAX_CONCURRENT_PREDICTIONS = 2
|
||||
|
||||
prediction_semaphore = asyncio.Semaphore(MAX_CONCURRENT_PREDICTIONS)
|
||||
|
||||
async def predict_with_limit(self, image, custom_params=None):
|
||||
async with prediction_semaphore:
|
||||
return await self._predict(image, custom_params)
|
||||
```
|
||||
|
||||
#### Added: Selective Processing
|
||||
```python
|
||||
class ProcessingConfig:
|
||||
enable_charts: bool = True
|
||||
enable_formulas: bool = True
|
||||
enable_tables: bool = True
|
||||
batch_size: int = 10 # Pages per batch
|
||||
```
|
||||
|
||||
### 5. Resource Cleanup
|
||||
|
||||
#### Added: Cleanup Hooks
|
||||
```python
|
||||
@app.on_event("shutdown")
|
||||
async def shutdown_handler():
|
||||
"""Graceful shutdown with model unloading"""
|
||||
await model_manager.teardown()
|
||||
await service_pool.shutdown()
|
||||
```
|
||||
|
||||
#### Modified: Task Completion
|
||||
```python
|
||||
async def process_task(task_id: str):
|
||||
service = None
|
||||
try:
|
||||
service = await pool.acquire()
|
||||
# ... processing ...
|
||||
finally:
|
||||
if service:
|
||||
await pool.release(service)
|
||||
await cleanup_task_resources(task_id)
|
||||
```
|
||||
|
||||
## Configuration Changes
|
||||
|
||||
### Added Settings
|
||||
```yaml
|
||||
memory:
|
||||
gpu_threshold_warning: 0.8 # 80% usage
|
||||
gpu_threshold_critical: 0.95 # 95% usage
|
||||
model_idle_timeout: 300 # 5 minutes
|
||||
enable_memory_monitor: true
|
||||
monitor_interval: 10 # seconds
|
||||
|
||||
pool:
|
||||
max_services_per_device: 2
|
||||
queue_timeout: 60 # seconds
|
||||
|
||||
concurrency:
|
||||
max_predictions: 2
|
||||
max_batch_size: 10
|
||||
```
|
||||
|
||||
## Breaking Changes
|
||||
None - All changes are backward compatible optimizations.
|
||||
|
||||
## Migration Path
|
||||
1. Deploy new code with default settings (no config changes needed)
|
||||
2. Monitor memory metrics via new endpoints
|
||||
3. Tune parameters based on workload
|
||||
4. Enable selective processing if needed
|
||||
Reference in New Issue
Block a user