feat: create OpenSpec proposal for enhanced memory management

- Create comprehensive proposal addressing OOM crashes and memory leaks - Define 6 core areas: model lifecycle, service pooling, monitoring - Add 58 implementation tasks across 8 sections - Design ModelManager with reference counting and idle timeout - Plan OCRServicePool for singleton service pattern - Specify MemoryGuard for proactive memory monitoring - Include concurrency controls and cleanup hooks - Add spec deltas for ocr-processing and task-management - Create detailed design document with architecture diagrams - Define performance targets: 75% memory reduction, 4x concurrency Critical improvements: - Remove PP-StructureV3 permanent exemption from unloading - Replace per-task OCRService instantiation with pooling - Add real GPU memory monitoring (currently always returns True) - Implement semaphore-based concurrency limits - Add proper resource cleanup on task completion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 15:21:32 +08:00
parent 2d0932face
commit ba8ddf2b68
6 changed files with 1105 additions and 0 deletions
--- a/openspec/changes/enhance-memory-management/delta-ocr-processing.md
+++ b/openspec/changes/enhance-memory-management/delta-ocr-processing.md
@@ -0,0 +1,146 @@
+# Spec Delta: ocr-processing
+
+## Changes to OCR Processing Specification
+
+### 1. Model Lifecycle Management
+
+#### Added: ModelManager Class
+```python
+class ModelManager:
+    """Manages model lifecycle with reference counting and idle timeout"""
+
+    def load_model(self, model_id: str, config: Dict) -> Model
+        """Load a model or return existing instance with ref count++"""
+
+    def unload_model(self, model_id: str) -> None
+        """Decrement ref count and unload if zero"""
+
+    def get_model(self, model_id: str) -> Optional[Model]
+        """Get model instance if loaded"""
+
+    def teardown(self) -> None
+        """Force unload all models immediately"""
+```
+
+#### Modified: PPStructureV3 Integration
+- Remove permanent exemption from unloading (lines 255-267)
+- Wrap PP-StructureV3 in ModelManager
+- Support lazy loading on first access
+- Add unload capability with cache clearing
+
+### 2. Service Architecture
+
+#### Added: OCRServicePool
+```python
+class OCRServicePool:
+    """Pool of OCRService instances (one per device)"""
+
+    def acquire(self, device: str = "GPU:0") -> OCRService
+        """Get service from pool with semaphore control"""
+
+    def release(self, service: OCRService) -> None
+        """Return service to pool"""
+```
+
+#### Modified: OCRService Instantiation
+- Replace direct instantiation with pool.acquire()
+- Add finally blocks for pool.release()
+- Handle pool exhaustion gracefully
+
+### 3. Memory Management
+
+#### Added: MemoryGuard Class
+```python
+class MemoryGuard:
+    """Monitor and control memory usage"""
+
+    def check_memory(self, required_mb: int = 0) -> bool
+        """Check if sufficient memory available"""
+
+    def get_memory_stats(self) -> Dict
+        """Get current memory usage statistics"""
+
+    def predict_memory(self, operation: str, params: Dict) -> int
+        """Predict memory requirement for operation"""
+```
+
+#### Modified: Processing Flow
+- Add memory checks before operations
+- Implement CPU fallback when GPU memory low
+- Add progressive loading for multi-page documents
+
+### 4. Concurrency Control
+
+#### Added: Prediction Semaphores
+```python
+# Maximum concurrent PP-StructureV3 predictions
+MAX_CONCURRENT_PREDICTIONS = 2
+
+prediction_semaphore = asyncio.Semaphore(MAX_CONCURRENT_PREDICTIONS)
+
+async def predict_with_limit(self, image, custom_params=None):
+    async with prediction_semaphore:
+        return await self._predict(image, custom_params)
+```
+
+#### Added: Selective Processing
+```python
+class ProcessingConfig:
+    enable_charts: bool = True
+    enable_formulas: bool = True
+    enable_tables: bool = True
+    batch_size: int = 10  # Pages per batch
+```
+
+### 5. Resource Cleanup
+
+#### Added: Cleanup Hooks
+```python
+@app.on_event("shutdown")
+async def shutdown_handler():
+    """Graceful shutdown with model unloading"""
+    await model_manager.teardown()
+    await service_pool.shutdown()
+```
+
+#### Modified: Task Completion
+```python
+async def process_task(task_id: str):
+    service = None
+    try:
+        service = await pool.acquire()
+        # ... processing ...
+    finally:
+        if service:
+            await pool.release(service)
+        await cleanup_task_resources(task_id)
+```
+
+## Configuration Changes
+
+### Added Settings
+```yaml
+memory:
+  gpu_threshold_warning: 0.8  # 80% usage
+  gpu_threshold_critical: 0.95  # 95% usage
+  model_idle_timeout: 300  # 5 minutes
+  enable_memory_monitor: true
+  monitor_interval: 10  # seconds
+
+pool:
+  max_services_per_device: 2
+  queue_timeout: 60  # seconds
+
+concurrency:
+  max_predictions: 2
+  max_batch_size: 10
+```
+
+## Breaking Changes
+None - All changes are backward compatible optimizations.
+
+## Migration Path
+1. Deploy new code with default settings (no config changes needed)
+2. Monitor memory metrics via new endpoints
+3. Tune parameters based on workload
+4. Enable selective processing if needed