chore: archive enhance-memory-management proposal (75/80 tasks)

Archive incomplete proposal for later continuation.
OCR processing has known quality issues to be addressed in future work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-26 16:10:45 +08:00
parent fa9b542b06
commit a227311b2d
6 changed files with 0 additions and 0 deletions

View File

@@ -0,0 +1,146 @@
# Spec Delta: ocr-processing
## Changes to OCR Processing Specification
### 1. Model Lifecycle Management
#### Added: ModelManager Class
```python
class ModelManager:
"""Manages model lifecycle with reference counting and idle timeout"""
def load_model(self, model_id: str, config: Dict) -> Model
"""Load a model or return existing instance with ref count++"""
def unload_model(self, model_id: str) -> None
"""Decrement ref count and unload if zero"""
def get_model(self, model_id: str) -> Optional[Model]
"""Get model instance if loaded"""
def teardown(self) -> None
"""Force unload all models immediately"""
```
#### Modified: PPStructureV3 Integration
- Remove permanent exemption from unloading (lines 255-267)
- Wrap PP-StructureV3 in ModelManager
- Support lazy loading on first access
- Add unload capability with cache clearing
### 2. Service Architecture
#### Added: OCRServicePool
```python
class OCRServicePool:
"""Pool of OCRService instances (one per device)"""
def acquire(self, device: str = "GPU:0") -> OCRService
"""Get service from pool with semaphore control"""
def release(self, service: OCRService) -> None
"""Return service to pool"""
```
#### Modified: OCRService Instantiation
- Replace direct instantiation with pool.acquire()
- Add finally blocks for pool.release()
- Handle pool exhaustion gracefully
### 3. Memory Management
#### Added: MemoryGuard Class
```python
class MemoryGuard:
"""Monitor and control memory usage"""
def check_memory(self, required_mb: int = 0) -> bool
"""Check if sufficient memory available"""
def get_memory_stats(self) -> Dict
"""Get current memory usage statistics"""
def predict_memory(self, operation: str, params: Dict) -> int
"""Predict memory requirement for operation"""
```
#### Modified: Processing Flow
- Add memory checks before operations
- Implement CPU fallback when GPU memory low
- Add progressive loading for multi-page documents
### 4. Concurrency Control
#### Added: Prediction Semaphores
```python
# Maximum concurrent PP-StructureV3 predictions
MAX_CONCURRENT_PREDICTIONS = 2
prediction_semaphore = asyncio.Semaphore(MAX_CONCURRENT_PREDICTIONS)
async def predict_with_limit(self, image, custom_params=None):
async with prediction_semaphore:
return await self._predict(image, custom_params)
```
#### Added: Selective Processing
```python
class ProcessingConfig:
enable_charts: bool = True
enable_formulas: bool = True
enable_tables: bool = True
batch_size: int = 10 # Pages per batch
```
### 5. Resource Cleanup
#### Added: Cleanup Hooks
```python
@app.on_event("shutdown")
async def shutdown_handler():
"""Graceful shutdown with model unloading"""
await model_manager.teardown()
await service_pool.shutdown()
```
#### Modified: Task Completion
```python
async def process_task(task_id: str):
service = None
try:
service = await pool.acquire()
# ... processing ...
finally:
if service:
await pool.release(service)
await cleanup_task_resources(task_id)
```
## Configuration Changes
### Added Settings
```yaml
memory:
gpu_threshold_warning: 0.8 # 80% usage
gpu_threshold_critical: 0.95 # 95% usage
model_idle_timeout: 300 # 5 minutes
enable_memory_monitor: true
monitor_interval: 10 # seconds
pool:
max_services_per_device: 2
queue_timeout: 60 # seconds
concurrency:
max_predictions: 2
max_batch_size: 10
```
## Breaking Changes
None - All changes are backward compatible optimizations.
## Migration Path
1. Deploy new code with default settings (no config changes needed)
2. Monitor memory metrics via new endpoints
3. Tune parameters based on workload
4. Enable selective processing if needed