Files
OCR/openspec/changes/add-ocr-batch-processing/specs/file-management/spec.md
beabigegg da700721fa first
2025-11-12 22:53:17 +08:00

3.5 KiB

File Management Specification

ADDED Requirements

Requirement: File Upload Validation

The system SHALL validate uploaded files for type, size, and content before processing.

Scenario: Valid image upload

  • WHEN user uploads a PNG file of 5MB
  • THEN the system accepts the file
  • AND stores it in temporary upload directory
  • AND returns upload success with file ID

Scenario: Oversized file rejection

  • WHEN user uploads a file larger than 20MB
  • THEN the system rejects the file
  • AND returns error message "文件大小超過限制 (最大 20MB)"
  • AND does not store the file

Scenario: Invalid file type rejection

  • WHEN user uploads a .exe or .zip file
  • THEN the system rejects the file
  • AND returns error message "不支援的文件類型,僅支援 PNG, JPG, JPEG, PDF"

Scenario: Corrupted image detection

  • WHEN user uploads a corrupted image file
  • THEN the system attempts to open the file
  • AND detects corruption during validation
  • AND returns error message "文件損壞,無法處理"

Requirement: Supported File Formats

The system SHALL support PNG, JPG, JPEG, and PDF file formats for OCR processing.

Scenario: PNG image processing

  • WHEN user uploads a .png file
  • THEN the system processes it directly with PaddleOCR

Scenario: JPG/JPEG image processing

  • WHEN user uploads a .jpg or .jpeg file
  • THEN the system processes it directly with PaddleOCR

Scenario: PDF file processing

  • WHEN user uploads a .pdf file
  • THEN the system converts PDF pages to images using pdf2image
  • AND processes each page image with PaddleOCR

Requirement: Batch Upload Management

The system SHALL manage multiple file uploads with batch organization.

Scenario: Create batch from multiple files

  • WHEN user uploads 5 files in a single request
  • THEN the system creates a batch with unique batch_id
  • AND associates all files with the batch_id
  • AND returns batch_id and file list

Scenario: Query batch status

  • WHEN user requests batch status by batch_id
  • THEN the system returns:
    • Total files in batch
    • Completed count
    • Failed count
    • Processing count
    • Overall batch status (pending/processing/completed/failed)

Requirement: File Storage Management

The system SHALL store uploaded files temporarily and clean up after processing.

Scenario: Temporary file storage

  • WHEN user uploads files
  • THEN the system stores files in uploads/{batch_id}/ directory
  • AND generates unique filenames to prevent conflicts

Scenario: Automatic cleanup after processing

  • WHEN OCR processing completes for a batch
  • THEN the system keeps files for 24 hours
  • AND automatically deletes files after retention period
  • AND preserves OCR results in database

Scenario: Manual file deletion

  • WHEN user requests to delete a batch
  • THEN the system removes all associated files from storage
  • AND marks the batch as deleted in database
  • AND returns deletion confirmation

Requirement: File Access Control

The system SHALL ensure users can only access their own uploaded files.

Scenario: User accesses own files

  • WHEN authenticated user requests file by file_id
  • THEN the system verifies ownership
  • AND returns file if user is the owner

Scenario: User attempts to access others' files

  • WHEN user requests file_id belonging to another user
  • THEN the system denies access
  • AND returns 403 Forbidden error