This commit is contained in:
beabigegg
2025-11-12 22:53:17 +08:00
commit da700721fa
130 changed files with 23393 additions and 0 deletions

View File

@@ -0,0 +1,96 @@
# File Management Specification
## ADDED Requirements
### Requirement: File Upload Validation
The system SHALL validate uploaded files for type, size, and content before processing.
#### Scenario: Valid image upload
- **WHEN** user uploads a PNG file of 5MB
- **THEN** the system accepts the file
- **AND** stores it in temporary upload directory
- **AND** returns upload success with file ID
#### Scenario: Oversized file rejection
- **WHEN** user uploads a file larger than 20MB
- **THEN** the system rejects the file
- **AND** returns error message "文件大小超過限制 (最大 20MB)"
- **AND** does not store the file
#### Scenario: Invalid file type rejection
- **WHEN** user uploads a .exe or .zip file
- **THEN** the system rejects the file
- **AND** returns error message "不支援的文件類型,僅支援 PNG, JPG, JPEG, PDF"
#### Scenario: Corrupted image detection
- **WHEN** user uploads a corrupted image file
- **THEN** the system attempts to open the file
- **AND** detects corruption during validation
- **AND** returns error message "文件損壞,無法處理"
### Requirement: Supported File Formats
The system SHALL support PNG, JPG, JPEG, and PDF file formats for OCR processing.
#### Scenario: PNG image processing
- **WHEN** user uploads a .png file
- **THEN** the system processes it directly with PaddleOCR
#### Scenario: JPG/JPEG image processing
- **WHEN** user uploads a .jpg or .jpeg file
- **THEN** the system processes it directly with PaddleOCR
#### Scenario: PDF file processing
- **WHEN** user uploads a .pdf file
- **THEN** the system converts PDF pages to images using pdf2image
- **AND** processes each page image with PaddleOCR
### Requirement: Batch Upload Management
The system SHALL manage multiple file uploads with batch organization.
#### Scenario: Create batch from multiple files
- **WHEN** user uploads 5 files in a single request
- **THEN** the system creates a batch with unique batch_id
- **AND** associates all files with the batch_id
- **AND** returns batch_id and file list
#### Scenario: Query batch status
- **WHEN** user requests batch status by batch_id
- **THEN** the system returns:
- Total files in batch
- Completed count
- Failed count
- Processing count
- Overall batch status (pending/processing/completed/failed)
### Requirement: File Storage Management
The system SHALL store uploaded files temporarily and clean up after processing.
#### Scenario: Temporary file storage
- **WHEN** user uploads files
- **THEN** the system stores files in `uploads/{batch_id}/` directory
- **AND** generates unique filenames to prevent conflicts
#### Scenario: Automatic cleanup after processing
- **WHEN** OCR processing completes for a batch
- **THEN** the system keeps files for 24 hours
- **AND** automatically deletes files after retention period
- **AND** preserves OCR results in database
#### Scenario: Manual file deletion
- **WHEN** user requests to delete a batch
- **THEN** the system removes all associated files from storage
- **AND** marks the batch as deleted in database
- **AND** returns deletion confirmation
### Requirement: File Access Control
The system SHALL ensure users can only access their own uploaded files.
#### Scenario: User accesses own files
- **WHEN** authenticated user requests file by file_id
- **THEN** the system verifies ownership
- **AND** returns file if user is the owner
#### Scenario: User attempts to access others' files
- **WHEN** user requests file_id belonging to another user
- **THEN** the system denies access
- **AND** returns 403 Forbidden error