This commit is contained in:
beabigegg
2025-11-12 22:53:17 +08:00
commit da700721fa
130 changed files with 23393 additions and 0 deletions

View File

@@ -0,0 +1,175 @@
# Export Results Specification
## ADDED Requirements
### Requirement: Plain Text Export
The system SHALL export OCR results as plain text files with configurable formatting.
#### Scenario: Export single file result as TXT
- **WHEN** user selects a completed OCR task and chooses TXT export
- **THEN** the system generates a .txt file with extracted text
- **AND** preserves line breaks based on bounding box positions
- **AND** returns downloadable file
#### Scenario: Export batch results as TXT
- **WHEN** user exports a batch with 5 files as TXT
- **THEN** the system creates a ZIP file containing 5 .txt files
- **AND** names each file as `{original_filename}_ocr.txt`
- **AND** returns the ZIP for download
### Requirement: JSON Export
The system SHALL export OCR results as structured JSON with full metadata.
#### Scenario: Export with metadata
- **WHEN** user selects JSON export format
- **THEN** the system generates JSON containing:
- File information (name, size, format)
- OCR results array with text, bounding boxes, confidence
- Processing metadata (timestamp, language, model version)
- Task status and statistics
#### Scenario: JSON export example structure
- **WHEN** export is generated
- **THEN** JSON structure follows this format:
```json
{
"file_name": "document.png",
"file_size": 1024000,
"upload_time": "2025-01-01T10:00:00Z",
"processing_time": 2.5,
"language": "zh-TW",
"results": [
{
"text": "範例文字",
"bbox": [100, 50, 200, 80],
"confidence": 0.95
}
],
"status": "completed"
}
```
### Requirement: Excel Export
The system SHALL export OCR results as Excel spreadsheets with tabular format.
#### Scenario: Single file Excel export
- **WHEN** user selects Excel export for one file
- **THEN** the system generates .xlsx file with columns:
- Row Number
- Recognized Text
- Confidence Score
- Bounding Box (X, Y, Width, Height)
- Language
#### Scenario: Batch Excel export with multiple sheets
- **WHEN** user exports batch with 3 files as Excel
- **THEN** the system creates one .xlsx file with 3 sheets
- **AND** names each sheet as the original filename
- **AND** includes summary sheet with statistics
### Requirement: Rule-Based Output Formatting
The system SHALL apply user-defined rules to format exported text.
#### Scenario: Group by filename pattern
- **WHEN** user defines rule "group files with prefix 'invoice_'"
- **THEN** the system groups all matching files together
- **AND** exports them in a single combined file or folder
#### Scenario: Filter by confidence threshold
- **WHEN** user sets export rule "minimum confidence 0.8"
- **THEN** the system excludes text with confidence < 0.8 from export
- **AND** includes only high-confidence results
#### Scenario: Custom text formatting
- **WHEN** user defines rule "add line numbers"
- **THEN** the system prepends line numbers to each text line
- **AND** formats output as: `1. 第一行文字\n2. 第二行文字`
#### Scenario: Sort by reading order
- **WHEN** user enables "sort by position" rule
- **THEN** the system orders text by vertical position (top to bottom)
- **AND** then by horizontal position (left to right) within each row
- **AND** exports text in natural reading order
### Requirement: Export Rule Configuration
The system SHALL allow users to save and reuse export rules.
#### Scenario: Save custom export rule
- **WHEN** user creates a rule with name "高品質發票輸出"
- **THEN** the system saves the rule to database
- **AND** associates it with the user account
- **AND** makes it available in rule selection dropdown
#### Scenario: Apply saved rule
- **WHEN** user selects a saved rule for export
- **THEN** the system applies all configured filters and formatting
- **AND** generates output according to rule settings
#### Scenario: Edit existing rule
- **WHEN** user modifies a saved rule
- **THEN** the system updates the rule configuration
- **AND** preserves the rule ID for continuity
### Requirement: Markdown Export with Structure and Images
The system SHALL export OCR results as Markdown files preserving document logical structure with accompanying images.
#### Scenario: Export as Markdown with structure and images
- **WHEN** user selects Markdown export format
- **THEN** the system generates .md file with logical structure
- **AND** includes headings, paragraphs, tables, lists in proper hierarchy
- **AND** embeds image references pointing to extracted images (![](./images/img1.jpg))
- **AND** maintains reading order from OCR analysis
- **AND** includes extracted images in an images/ folder
#### Scenario: Batch Markdown export with images
- **WHEN** user exports batch with 5 files as Markdown
- **THEN** the system creates 5 separate .md files
- **AND** creates corresponding images/ folders for each document
- **AND** optionally creates combined .md with page separators
- **AND** returns ZIP file containing all Markdown files and images
### Requirement: Searchable PDF Export with Images
The system SHALL generate searchable PDF files that include extracted text and images, preserving logical document structure (not exact visual layout).
#### Scenario: Single document PDF export with images
- **WHEN** user requests PDF export from OCR result
- **THEN** the system converts Markdown to HTML with basic CSS styling
- **AND** embeds extracted images from images/ folder
- **AND** generates PDF using Pandoc + WeasyPrint
- **AND** preserves document hierarchy, tables, and reading order
- **AND** images appear near their logical position in text flow
- **AND** uses appropriate Chinese font (Noto Sans CJK)
- **AND** produces searchable PDF with selectable text
#### Scenario: Basic PDF formatting options
- **WHEN** user selects PDF export
- **THEN** the system applies basic readable formatting
- **AND** sets standard margins and page size (A4)
- **AND** uses consistent fonts and spacing
- **AND** ensures images fit within page width
- **NOTE** CSS templates are for basic readability, not for replicating original visual design
#### Scenario: Batch PDF export with images
- **WHEN** user exports batch as PDF
- **THEN** the system generates individual PDF for each document with embedded images
- **OR** creates single merged PDF with page breaks
- **AND** maintains consistent formatting across all pages
- **AND** returns ZIP of PDFs or single merged PDF
### Requirement: Export Format Selection
The system SHALL provide UI for selecting export format and options.
#### Scenario: Format selection with preview
- **WHEN** user opens export dialog
- **THEN** the system displays format options (TXT, JSON, Excel, **Markdown with images, Searchable PDF**)
- **AND** shows preview of output structure for selected format
- **AND** allows applying custom rules for text filtering
- **AND** provides basic formatting option for PDF (standard readable format)
#### Scenario: Batch export with format choice
- **WHEN** user selects multiple completed tasks
- **THEN** the system enables batch export button
- **AND** prompts for format selection
- **AND** generates combined export file
- **AND** shows progress bar for PDF generation (slower due to image processing)
- **AND** includes all extracted images when exporting Markdown or PDF

View File

@@ -0,0 +1,96 @@
# File Management Specification
## ADDED Requirements
### Requirement: File Upload Validation
The system SHALL validate uploaded files for type, size, and content before processing.
#### Scenario: Valid image upload
- **WHEN** user uploads a PNG file of 5MB
- **THEN** the system accepts the file
- **AND** stores it in temporary upload directory
- **AND** returns upload success with file ID
#### Scenario: Oversized file rejection
- **WHEN** user uploads a file larger than 20MB
- **THEN** the system rejects the file
- **AND** returns error message "文件大小超過限制 (最大 20MB)"
- **AND** does not store the file
#### Scenario: Invalid file type rejection
- **WHEN** user uploads a .exe or .zip file
- **THEN** the system rejects the file
- **AND** returns error message "不支援的文件類型,僅支援 PNG, JPG, JPEG, PDF"
#### Scenario: Corrupted image detection
- **WHEN** user uploads a corrupted image file
- **THEN** the system attempts to open the file
- **AND** detects corruption during validation
- **AND** returns error message "文件損壞,無法處理"
### Requirement: Supported File Formats
The system SHALL support PNG, JPG, JPEG, and PDF file formats for OCR processing.
#### Scenario: PNG image processing
- **WHEN** user uploads a .png file
- **THEN** the system processes it directly with PaddleOCR
#### Scenario: JPG/JPEG image processing
- **WHEN** user uploads a .jpg or .jpeg file
- **THEN** the system processes it directly with PaddleOCR
#### Scenario: PDF file processing
- **WHEN** user uploads a .pdf file
- **THEN** the system converts PDF pages to images using pdf2image
- **AND** processes each page image with PaddleOCR
### Requirement: Batch Upload Management
The system SHALL manage multiple file uploads with batch organization.
#### Scenario: Create batch from multiple files
- **WHEN** user uploads 5 files in a single request
- **THEN** the system creates a batch with unique batch_id
- **AND** associates all files with the batch_id
- **AND** returns batch_id and file list
#### Scenario: Query batch status
- **WHEN** user requests batch status by batch_id
- **THEN** the system returns:
- Total files in batch
- Completed count
- Failed count
- Processing count
- Overall batch status (pending/processing/completed/failed)
### Requirement: File Storage Management
The system SHALL store uploaded files temporarily and clean up after processing.
#### Scenario: Temporary file storage
- **WHEN** user uploads files
- **THEN** the system stores files in `uploads/{batch_id}/` directory
- **AND** generates unique filenames to prevent conflicts
#### Scenario: Automatic cleanup after processing
- **WHEN** OCR processing completes for a batch
- **THEN** the system keeps files for 24 hours
- **AND** automatically deletes files after retention period
- **AND** preserves OCR results in database
#### Scenario: Manual file deletion
- **WHEN** user requests to delete a batch
- **THEN** the system removes all associated files from storage
- **AND** marks the batch as deleted in database
- **AND** returns deletion confirmation
### Requirement: File Access Control
The system SHALL ensure users can only access their own uploaded files.
#### Scenario: User accesses own files
- **WHEN** authenticated user requests file by file_id
- **THEN** the system verifies ownership
- **AND** returns file if user is the owner
#### Scenario: User attempts to access others' files
- **WHEN** user requests file_id belonging to another user
- **THEN** the system denies access
- **AND** returns 403 Forbidden error

View File

@@ -0,0 +1,125 @@
# OCR Processing Specification
## ADDED Requirements
### Requirement: Multi-Language Text Recognition with Structure Analysis
The system SHALL extract text and images from document files using PaddleOCR-VL with support for 109 languages including Chinese (traditional and simplified), English, Japanese, and Korean, while preserving document logical structure and reading order (not pixel-perfect visual layout).
#### Scenario: Single image OCR with Chinese text
- **WHEN** user uploads a PNG image containing Chinese text
- **THEN** the system extracts text with bounding boxes and confidence scores
- **AND** returns structured JSON with recognized text, coordinates, and language detected
- **AND** generates Markdown output preserving text layout and hierarchy
#### Scenario: PDF document OCR with layout preservation
- **WHEN** user uploads a multi-page PDF file
- **THEN** the system processes each page with PaddleOCR-VL
- **AND** performs layout analysis to identify document elements (titles, paragraphs, tables, images, formulas)
- **AND** returns Markdown organized by page with preserved reading order
- **AND** provides JSON with detailed layout structure and bounding boxes
#### Scenario: Mixed language content
- **WHEN** user uploads an image with both Chinese and English text
- **THEN** the system detects and extracts text in both languages
- **AND** preserves the spatial relationship between text regions
- **AND** maintains proper reading order in output Markdown
#### Scenario: Complex document with tables and images
- **WHEN** user uploads a scanned document containing tables, images, and text
- **THEN** the system identifies layout elements (text blocks, tables, images, formulas)
- **AND** extracts table structure as Markdown tables
- **AND** extracts and saves document images as separate files
- **AND** embeds image references in Markdown (![](path/to/image.jpg))
- **AND** preserves document hierarchy and reading order in Markdown output
### Requirement: Batch Processing
The system SHALL process multiple files concurrently with progress tracking and error handling.
#### Scenario: Batch upload success
- **WHEN** user uploads 10 image files simultaneously
- **THEN** the system creates a batch task with unique batch ID
- **AND** processes files in parallel (up to configured worker limit)
- **AND** returns real-time progress updates via WebSocket or polling
#### Scenario: Batch processing with partial failure
- **WHEN** a batch contains 5 valid images and 2 corrupted files
- **THEN** the system processes all valid files successfully
- **AND** logs errors for corrupted files with specific error messages
- **AND** marks the batch as "partially completed"
### Requirement: Image Preprocessing
The system SHALL provide optional image preprocessing to improve OCR accuracy.
#### Scenario: Low contrast image enhancement
- **WHEN** user enables preprocessing for a low-contrast image
- **THEN** the system applies contrast adjustment and denoising
- **AND** performs OCR on the enhanced image
- **AND** returns better accuracy compared to original
#### Scenario: Skipped preprocessing
- **WHEN** user disables preprocessing option
- **THEN** the system performs OCR directly on original image
- **AND** completes processing faster
### Requirement: Confidence Threshold Filtering
The system SHALL filter OCR results based on configurable confidence threshold.
#### Scenario: High confidence filter
- **WHEN** user sets confidence threshold to 0.8
- **THEN** the system returns only text segments with confidence >= 0.8
- **AND** discards low-confidence results
#### Scenario: Include all results
- **WHEN** user sets confidence threshold to 0.0
- **THEN** the system returns all recognized text regardless of confidence
- **AND** includes confidence scores in output
### Requirement: OCR Result Structure
The system SHALL return OCR results in multiple formats (JSON, Markdown) with extracted text, images, and structure metadata.
#### Scenario: Successful OCR result with multiple formats
- **WHEN** OCR processing completes successfully
- **THEN** the system returns JSON containing:
- File metadata (name, size, format, upload timestamp)
- Detected text regions with bounding boxes (x, y, width, height)
- Recognized text content for each region
- Confidence scores (0.0 to 1.0)
- Language detected
- Layout element types (title, paragraph, table, image, formula)
- Reading order sequence
- List of extracted image files with paths
- Processing time
- Task status (completed/failed/partial)
- **AND** generates Markdown file with logical structure
- **AND** saves extracted images to storage directory
- **AND** provides methods to export as searchable PDF with images
#### Scenario: Searchable PDF generation with images
- **WHEN** user requests PDF export from OCR results
- **THEN** the system converts Markdown to HTML with basic CSS styling
- **AND** embeds extracted images in their logical positions (not exact original positions)
- **AND** generates PDF using Pandoc + WeasyPrint
- **AND** preserves document hierarchy, tables, and reading order
- **AND** applies appropriate fonts for Chinese characters
- **AND** produces searchable PDF (text is selectable and searchable)
### Requirement: Document Translation (Reserved Architecture)
The system SHALL provide architecture and UI placeholders for future document translation features.
#### Scenario: Translation option visibility (UI placeholder)
- **WHEN** user views OCR result page
- **THEN** the system displays a "Translate Document" button (disabled or labeled "Coming Soon")
- **AND** shows target language selection dropdown (disabled)
- **AND** provides tooltip: "Translation feature will be available in future release"
#### Scenario: Translation API endpoint (reserved)
- **WHEN** backend API is queried for translation endpoints
- **THEN** the system provides `/api/v1/translate/document` endpoint specification
- **AND** returns "Not Implemented" (501) status when called
- **AND** documents expected request/response format for future implementation
#### Scenario: Translation configuration storage (database schema)
- **WHEN** database schema is created
- **THEN** the system includes `translation_configs` table
- **AND** defines columns: id, user_id, source_lang, target_lang, engine_type, engine_config, created_at
- **AND** table remains empty until translation feature is implemented