chore: project cleanup and prepare for dual-track processing refactor
- Removed all test files and directories - Deleted outdated documentation (will be rewritten) - Cleaned up temporary files, logs, and uploads - Archived 5 completed OpenSpec proposals - Created new dual-track-document-processing proposal with complete OpenSpec structure - Dual-track architecture: OCR track (PaddleOCR) + Direct track (PyMuPDF) - UnifiedDocument model for consistent output - Support for structure-preserving translation - Updated .gitignore to prevent future test/temp files This is a major cleanup preparing for the complete refactoring of the document processing pipeline. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,96 @@
|
||||
# File Management Specification
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: File Upload Validation
|
||||
The system SHALL validate uploaded files for type, size, and content before processing.
|
||||
|
||||
#### Scenario: Valid image upload
|
||||
- **WHEN** user uploads a PNG file of 5MB
|
||||
- **THEN** the system accepts the file
|
||||
- **AND** stores it in temporary upload directory
|
||||
- **AND** returns upload success with file ID
|
||||
|
||||
#### Scenario: Oversized file rejection
|
||||
- **WHEN** user uploads a file larger than 20MB
|
||||
- **THEN** the system rejects the file
|
||||
- **AND** returns error message "文件大小超過限制 (最大 20MB)"
|
||||
- **AND** does not store the file
|
||||
|
||||
#### Scenario: Invalid file type rejection
|
||||
- **WHEN** user uploads a .exe or .zip file
|
||||
- **THEN** the system rejects the file
|
||||
- **AND** returns error message "不支援的文件類型,僅支援 PNG, JPG, JPEG, PDF"
|
||||
|
||||
#### Scenario: Corrupted image detection
|
||||
- **WHEN** user uploads a corrupted image file
|
||||
- **THEN** the system attempts to open the file
|
||||
- **AND** detects corruption during validation
|
||||
- **AND** returns error message "文件損壞,無法處理"
|
||||
|
||||
### Requirement: Supported File Formats
|
||||
The system SHALL support PNG, JPG, JPEG, and PDF file formats for OCR processing.
|
||||
|
||||
#### Scenario: PNG image processing
|
||||
- **WHEN** user uploads a .png file
|
||||
- **THEN** the system processes it directly with PaddleOCR
|
||||
|
||||
#### Scenario: JPG/JPEG image processing
|
||||
- **WHEN** user uploads a .jpg or .jpeg file
|
||||
- **THEN** the system processes it directly with PaddleOCR
|
||||
|
||||
#### Scenario: PDF file processing
|
||||
- **WHEN** user uploads a .pdf file
|
||||
- **THEN** the system converts PDF pages to images using pdf2image
|
||||
- **AND** processes each page image with PaddleOCR
|
||||
|
||||
### Requirement: Batch Upload Management
|
||||
The system SHALL manage multiple file uploads with batch organization.
|
||||
|
||||
#### Scenario: Create batch from multiple files
|
||||
- **WHEN** user uploads 5 files in a single request
|
||||
- **THEN** the system creates a batch with unique batch_id
|
||||
- **AND** associates all files with the batch_id
|
||||
- **AND** returns batch_id and file list
|
||||
|
||||
#### Scenario: Query batch status
|
||||
- **WHEN** user requests batch status by batch_id
|
||||
- **THEN** the system returns:
|
||||
- Total files in batch
|
||||
- Completed count
|
||||
- Failed count
|
||||
- Processing count
|
||||
- Overall batch status (pending/processing/completed/failed)
|
||||
|
||||
### Requirement: File Storage Management
|
||||
The system SHALL store uploaded files temporarily and clean up after processing.
|
||||
|
||||
#### Scenario: Temporary file storage
|
||||
- **WHEN** user uploads files
|
||||
- **THEN** the system stores files in `uploads/{batch_id}/` directory
|
||||
- **AND** generates unique filenames to prevent conflicts
|
||||
|
||||
#### Scenario: Automatic cleanup after processing
|
||||
- **WHEN** OCR processing completes for a batch
|
||||
- **THEN** the system keeps files for 24 hours
|
||||
- **AND** automatically deletes files after retention period
|
||||
- **AND** preserves OCR results in database
|
||||
|
||||
#### Scenario: Manual file deletion
|
||||
- **WHEN** user requests to delete a batch
|
||||
- **THEN** the system removes all associated files from storage
|
||||
- **AND** marks the batch as deleted in database
|
||||
- **AND** returns deletion confirmation
|
||||
|
||||
### Requirement: File Access Control
|
||||
The system SHALL ensure users can only access their own uploaded files.
|
||||
|
||||
#### Scenario: User accesses own files
|
||||
- **WHEN** authenticated user requests file by file_id
|
||||
- **THEN** the system verifies ownership
|
||||
- **AND** returns file if user is the owner
|
||||
|
||||
#### Scenario: User attempts to access others' files
|
||||
- **WHEN** user requests file_id belonging to another user
|
||||
- **THEN** the system denies access
|
||||
- **AND** returns 403 Forbidden error
|
||||
Reference in New Issue
Block a user