feat: create OpenSpec proposal for enhanced memory management

- Create comprehensive proposal addressing OOM crashes and memory leaks - Define 6 core areas: model lifecycle, service pooling, monitoring - Add 58 implementation tasks across 8 sections - Design ModelManager with reference counting and idle timeout - Plan OCRServicePool for singleton service pattern - Specify MemoryGuard for proactive memory monitoring - Include concurrency controls and cleanup hooks - Add spec deltas for ocr-processing and task-management - Create detailed design document with architecture diagrams - Define performance targets: 75% memory reduction, 4x concurrency Critical improvements: - Remove PP-StructureV3 permanent exemption from unloading - Replace per-task OCRService instantiation with pooling - Add real GPU memory monitoring (currently always returns True) - Implement semaphore-based concurrency limits - Add proper resource cleanup on task completion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 15:21:32 +08:00
parent 2d0932face
commit ba8ddf2b68
6 changed files with 1105 additions and 0 deletions
--- a/openspec/changes/enhance-memory-management/specs/memory-management/spec.md
+++ b/openspec/changes/enhance-memory-management/specs/memory-management/spec.md
@@ -0,0 +1,104 @@
+# Memory Management Specification
+
+## ADDED Requirements
+
+### Requirement: Model Manager
+The system SHALL provide a ModelManager class that manages model lifecycle with reference counting and idle timeout mechanisms.
+
+#### Scenario: Loading a model
+GIVEN a request to load a model
+WHEN the model is not already loaded
+THEN the ModelManager creates a new instance and sets reference count to 1
+
+#### Scenario: Reusing loaded model
+GIVEN a model is already loaded
+WHEN another request for the same model arrives
+THEN the ModelManager returns the existing instance and increments reference count
+
+#### Scenario: Unloading idle model
+GIVEN a model with zero reference count
+WHEN the idle timeout period expires
+THEN the ModelManager unloads the model and frees memory
+
+### Requirement: Service Pool
+The system SHALL implement an OCRServicePool that manages a pool of OCRService instances with one instance per GPU/CPU device.
+
+#### Scenario: Acquiring service from pool
+GIVEN a task needs processing
+WHEN a service is requested from the pool
+THEN the pool returns an available service or queues the request if all services are busy
+
+#### Scenario: Releasing service to pool
+GIVEN a task has completed processing
+WHEN the service is released
+THEN the service becomes available for other tasks in the pool
+
+### Requirement: Memory Monitoring
+The system SHALL continuously monitor GPU and CPU memory usage and trigger preventive actions based on configurable thresholds.
+
+#### Scenario: Memory warning threshold
+GIVEN memory usage reaches 80% (warning threshold)
+WHEN a new task is requested
+THEN the system logs a warning and may defer non-critical operations
+
+#### Scenario: Memory critical threshold
+GIVEN memory usage reaches 95% (critical threshold)
+WHEN a new task is requested
+THEN the system attempts CPU fallback or rejects the task
+
+### Requirement: Concurrency Control
+The system SHALL limit concurrent PP-StructureV3 predictions using semaphores to prevent memory exhaustion.
+
+#### Scenario: Concurrent prediction limit
+GIVEN the maximum concurrent predictions is set to 2
+WHEN 2 predictions are already running
+THEN additional prediction requests wait in queue until a slot becomes available
+
+### Requirement: Resource Cleanup
+The system SHALL ensure all resources are properly cleaned up after task completion or failure.
+
+#### Scenario: Successful task cleanup
+GIVEN a task completes successfully
+WHEN the task finishes
+THEN all allocated memory, temporary files, and model references are released
+
+#### Scenario: Failed task cleanup
+GIVEN a task fails with an error
+WHEN the error handler runs
+THEN cleanup is performed in the finally block regardless of failure reason
+
+## MODIFIED Requirements
+
+### Requirement: OCR Service Instantiation
+The OCR service instantiation SHALL use pooled instances instead of creating new instances for each task.
+
+#### Scenario: Task using pooled service
+GIVEN a new OCR task arrives
+WHEN the task starts processing
+THEN it acquires a service from the pool instead of creating a new instance
+
+### Requirement: PP-StructureV3 Model Management
+The PP-StructureV3 model SHALL be subject to the same lifecycle management as other models, removing its permanent exemption from unloading.
+
+#### Scenario: PP-StructureV3 unloading
+GIVEN PP-StructureV3 has been idle for the configured timeout
+WHEN memory pressure is detected
+THEN the model can be unloaded to free memory
+
+### Requirement: Task Resource Tracking
+Tasks SHALL track their resource usage including estimated and actual memory consumption.
+
+#### Scenario: Task memory tracking
+GIVEN a task is processing
+WHEN memory metrics are collected
+THEN the task records both estimated and actual memory usage for analysis
+
+## REMOVED Requirements
+
+### Requirement: Permanent Model Loading
+The requirement for PP-StructureV3 to remain permanently loaded SHALL be removed.
+
+#### Scenario: Dynamic model loading
+GIVEN the system starts
+WHEN no tasks are using PP-StructureV3
+THEN the model is not loaded until first use