# File Management Specification ## ADDED Requirements ### Requirement: File Upload Validation The system SHALL validate uploaded files for type, size, and content before processing. #### Scenario: Valid image upload - **WHEN** user uploads a PNG file of 5MB - **THEN** the system accepts the file - **AND** stores it in temporary upload directory - **AND** returns upload success with file ID #### Scenario: Oversized file rejection - **WHEN** user uploads a file larger than 20MB - **THEN** the system rejects the file - **AND** returns error message "文件大小超過限制 (最大 20MB)" - **AND** does not store the file #### Scenario: Invalid file type rejection - **WHEN** user uploads a .exe or .zip file - **THEN** the system rejects the file - **AND** returns error message "不支援的文件類型,僅支援 PNG, JPG, JPEG, PDF" #### Scenario: Corrupted image detection - **WHEN** user uploads a corrupted image file - **THEN** the system attempts to open the file - **AND** detects corruption during validation - **AND** returns error message "文件損壞,無法處理" ### Requirement: Supported File Formats The system SHALL support PNG, JPG, JPEG, and PDF file formats for OCR processing. #### Scenario: PNG image processing - **WHEN** user uploads a .png file - **THEN** the system processes it directly with PaddleOCR #### Scenario: JPG/JPEG image processing - **WHEN** user uploads a .jpg or .jpeg file - **THEN** the system processes it directly with PaddleOCR #### Scenario: PDF file processing - **WHEN** user uploads a .pdf file - **THEN** the system converts PDF pages to images using pdf2image - **AND** processes each page image with PaddleOCR ### Requirement: Batch Upload Management The system SHALL manage multiple file uploads with batch organization. #### Scenario: Create batch from multiple files - **WHEN** user uploads 5 files in a single request - **THEN** the system creates a batch with unique batch_id - **AND** associates all files with the batch_id - **AND** returns batch_id and file list #### Scenario: Query batch status - **WHEN** user requests batch status by batch_id - **THEN** the system returns: - Total files in batch - Completed count - Failed count - Processing count - Overall batch status (pending/processing/completed/failed) ### Requirement: File Storage Management The system SHALL store uploaded files temporarily and clean up after processing. #### Scenario: Temporary file storage - **WHEN** user uploads files - **THEN** the system stores files in `uploads/{batch_id}/` directory - **AND** generates unique filenames to prevent conflicts #### Scenario: Automatic cleanup after processing - **WHEN** OCR processing completes for a batch - **THEN** the system keeps files for 24 hours - **AND** automatically deletes files after retention period - **AND** preserves OCR results in database #### Scenario: Manual file deletion - **WHEN** user requests to delete a batch - **THEN** the system removes all associated files from storage - **AND** marks the batch as deleted in database - **AND** returns deletion confirmation ### Requirement: File Access Control The system SHALL ensure users can only access their own uploaded files. #### Scenario: User accesses own files - **WHEN** authenticated user requests file by file_id - **THEN** the system verifies ownership - **AND** returns file if user is the owner #### Scenario: User attempts to access others' files - **WHEN** user requests file_id belonging to another user - **THEN** the system denies access - **AND** returns 403 Forbidden error