chore: archive enhance-memory-management proposal (75/80 tasks)
Archive incomplete proposal for later continuation. OCR processing has known quality issues to be addressed in future work. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,104 @@
|
||||
# Memory Management Specification
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Model Manager
|
||||
The system SHALL provide a ModelManager class that manages model lifecycle with reference counting and idle timeout mechanisms.
|
||||
|
||||
#### Scenario: Loading a model
|
||||
GIVEN a request to load a model
|
||||
WHEN the model is not already loaded
|
||||
THEN the ModelManager creates a new instance and sets reference count to 1
|
||||
|
||||
#### Scenario: Reusing loaded model
|
||||
GIVEN a model is already loaded
|
||||
WHEN another request for the same model arrives
|
||||
THEN the ModelManager returns the existing instance and increments reference count
|
||||
|
||||
#### Scenario: Unloading idle model
|
||||
GIVEN a model with zero reference count
|
||||
WHEN the idle timeout period expires
|
||||
THEN the ModelManager unloads the model and frees memory
|
||||
|
||||
### Requirement: Service Pool
|
||||
The system SHALL implement an OCRServicePool that manages a pool of OCRService instances with one instance per GPU/CPU device.
|
||||
|
||||
#### Scenario: Acquiring service from pool
|
||||
GIVEN a task needs processing
|
||||
WHEN a service is requested from the pool
|
||||
THEN the pool returns an available service or queues the request if all services are busy
|
||||
|
||||
#### Scenario: Releasing service to pool
|
||||
GIVEN a task has completed processing
|
||||
WHEN the service is released
|
||||
THEN the service becomes available for other tasks in the pool
|
||||
|
||||
### Requirement: Memory Monitoring
|
||||
The system SHALL continuously monitor GPU and CPU memory usage and trigger preventive actions based on configurable thresholds.
|
||||
|
||||
#### Scenario: Memory warning threshold
|
||||
GIVEN memory usage reaches 80% (warning threshold)
|
||||
WHEN a new task is requested
|
||||
THEN the system logs a warning and may defer non-critical operations
|
||||
|
||||
#### Scenario: Memory critical threshold
|
||||
GIVEN memory usage reaches 95% (critical threshold)
|
||||
WHEN a new task is requested
|
||||
THEN the system attempts CPU fallback or rejects the task
|
||||
|
||||
### Requirement: Concurrency Control
|
||||
The system SHALL limit concurrent PP-StructureV3 predictions using semaphores to prevent memory exhaustion.
|
||||
|
||||
#### Scenario: Concurrent prediction limit
|
||||
GIVEN the maximum concurrent predictions is set to 2
|
||||
WHEN 2 predictions are already running
|
||||
THEN additional prediction requests wait in queue until a slot becomes available
|
||||
|
||||
### Requirement: Resource Cleanup
|
||||
The system SHALL ensure all resources are properly cleaned up after task completion or failure.
|
||||
|
||||
#### Scenario: Successful task cleanup
|
||||
GIVEN a task completes successfully
|
||||
WHEN the task finishes
|
||||
THEN all allocated memory, temporary files, and model references are released
|
||||
|
||||
#### Scenario: Failed task cleanup
|
||||
GIVEN a task fails with an error
|
||||
WHEN the error handler runs
|
||||
THEN cleanup is performed in the finally block regardless of failure reason
|
||||
|
||||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: OCR Service Instantiation
|
||||
The OCR service instantiation SHALL use pooled instances instead of creating new instances for each task.
|
||||
|
||||
#### Scenario: Task using pooled service
|
||||
GIVEN a new OCR task arrives
|
||||
WHEN the task starts processing
|
||||
THEN it acquires a service from the pool instead of creating a new instance
|
||||
|
||||
### Requirement: PP-StructureV3 Model Management
|
||||
The PP-StructureV3 model SHALL be subject to the same lifecycle management as other models, removing its permanent exemption from unloading.
|
||||
|
||||
#### Scenario: PP-StructureV3 unloading
|
||||
GIVEN PP-StructureV3 has been idle for the configured timeout
|
||||
WHEN memory pressure is detected
|
||||
THEN the model can be unloaded to free memory
|
||||
|
||||
### Requirement: Task Resource Tracking
|
||||
Tasks SHALL track their resource usage including estimated and actual memory consumption.
|
||||
|
||||
#### Scenario: Task memory tracking
|
||||
GIVEN a task is processing
|
||||
WHEN memory metrics are collected
|
||||
THEN the task records both estimated and actual memory usage for analysis
|
||||
|
||||
## REMOVED Requirements
|
||||
|
||||
### Requirement: Permanent Model Loading
|
||||
The requirement for PP-StructureV3 to remain permanently loaded SHALL be removed.
|
||||
|
||||
#### Scenario: Dynamic model loading
|
||||
GIVEN the system starts
|
||||
WHEN no tasks are using PP-StructureV3
|
||||
THEN the model is not loaded until first use
|
||||
Reference in New Issue
Block a user