7.1 KiB
7.1 KiB
Export Results Specification
ADDED Requirements
Requirement: Plain Text Export
The system SHALL export OCR results as plain text files with configurable formatting.
Scenario: Export single file result as TXT
- WHEN user selects a completed OCR task and chooses TXT export
- THEN the system generates a .txt file with extracted text
- AND preserves line breaks based on bounding box positions
- AND returns downloadable file
Scenario: Export batch results as TXT
- WHEN user exports a batch with 5 files as TXT
- THEN the system creates a ZIP file containing 5 .txt files
- AND names each file as
{original_filename}_ocr.txt - AND returns the ZIP for download
Requirement: JSON Export
The system SHALL export OCR results as structured JSON with full metadata.
Scenario: Export with metadata
- WHEN user selects JSON export format
- THEN the system generates JSON containing:
- File information (name, size, format)
- OCR results array with text, bounding boxes, confidence
- Processing metadata (timestamp, language, model version)
- Task status and statistics
Scenario: JSON export example structure
- WHEN export is generated
- THEN JSON structure follows this format:
{
"file_name": "document.png",
"file_size": 1024000,
"upload_time": "2025-01-01T10:00:00Z",
"processing_time": 2.5,
"language": "zh-TW",
"results": [
{
"text": "範例文字",
"bbox": [100, 50, 200, 80],
"confidence": 0.95
}
],
"status": "completed"
}
Requirement: Excel Export
The system SHALL export OCR results as Excel spreadsheets with tabular format.
Scenario: Single file Excel export
- WHEN user selects Excel export for one file
- THEN the system generates .xlsx file with columns:
- Row Number
- Recognized Text
- Confidence Score
- Bounding Box (X, Y, Width, Height)
- Language
Scenario: Batch Excel export with multiple sheets
- WHEN user exports batch with 3 files as Excel
- THEN the system creates one .xlsx file with 3 sheets
- AND names each sheet as the original filename
- AND includes summary sheet with statistics
Requirement: Rule-Based Output Formatting
The system SHALL apply user-defined rules to format exported text.
Scenario: Group by filename pattern
- WHEN user defines rule "group files with prefix 'invoice_'"
- THEN the system groups all matching files together
- AND exports them in a single combined file or folder
Scenario: Filter by confidence threshold
- WHEN user sets export rule "minimum confidence 0.8"
- THEN the system excludes text with confidence < 0.8 from export
- AND includes only high-confidence results
Scenario: Custom text formatting
- WHEN user defines rule "add line numbers"
- THEN the system prepends line numbers to each text line
- AND formats output as:
1. 第一行文字\n2. 第二行文字
Scenario: Sort by reading order
- WHEN user enables "sort by position" rule
- THEN the system orders text by vertical position (top to bottom)
- AND then by horizontal position (left to right) within each row
- AND exports text in natural reading order
Requirement: Export Rule Configuration
The system SHALL allow users to save and reuse export rules.
Scenario: Save custom export rule
- WHEN user creates a rule with name "高品質發票輸出"
- THEN the system saves the rule to database
- AND associates it with the user account
- AND makes it available in rule selection dropdown
Scenario: Apply saved rule
- WHEN user selects a saved rule for export
- THEN the system applies all configured filters and formatting
- AND generates output according to rule settings
Scenario: Edit existing rule
- WHEN user modifies a saved rule
- THEN the system updates the rule configuration
- AND preserves the rule ID for continuity
Requirement: Markdown Export with Structure and Images
The system SHALL export OCR results as Markdown files preserving document logical structure with accompanying images.
Scenario: Export as Markdown with structure and images
- WHEN user selects Markdown export format
- THEN the system generates .md file with logical structure
- AND includes headings, paragraphs, tables, lists in proper hierarchy
- AND embeds image references pointing to extracted images (
) - AND maintains reading order from OCR analysis
- AND includes extracted images in an images/ folder
Scenario: Batch Markdown export with images
- WHEN user exports batch with 5 files as Markdown
- THEN the system creates 5 separate .md files
- AND creates corresponding images/ folders for each document
- AND optionally creates combined .md with page separators
- AND returns ZIP file containing all Markdown files and images
Requirement: Searchable PDF Export with Images
The system SHALL generate searchable PDF files that include extracted text and images, preserving logical document structure (not exact visual layout).
Scenario: Single document PDF export with images
- WHEN user requests PDF export from OCR result
- THEN the system converts Markdown to HTML with basic CSS styling
- AND embeds extracted images from images/ folder
- AND generates PDF using Pandoc + WeasyPrint
- AND preserves document hierarchy, tables, and reading order
- AND images appear near their logical position in text flow
- AND uses appropriate Chinese font (Noto Sans CJK)
- AND produces searchable PDF with selectable text
Scenario: Basic PDF formatting options
- WHEN user selects PDF export
- THEN the system applies basic readable formatting
- AND sets standard margins and page size (A4)
- AND uses consistent fonts and spacing
- AND ensures images fit within page width
- NOTE CSS templates are for basic readability, not for replicating original visual design
Scenario: Batch PDF export with images
- WHEN user exports batch as PDF
- THEN the system generates individual PDF for each document with embedded images
- OR creates single merged PDF with page breaks
- AND maintains consistent formatting across all pages
- AND returns ZIP of PDFs or single merged PDF
Requirement: Export Format Selection
The system SHALL provide UI for selecting export format and options.
Scenario: Format selection with preview
- WHEN user opens export dialog
- THEN the system displays format options (TXT, JSON, Excel, Markdown with images, Searchable PDF)
- AND shows preview of output structure for selected format
- AND allows applying custom rules for text filtering
- AND provides basic formatting option for PDF (standard readable format)
Scenario: Batch export with format choice
- WHEN user selects multiple completed tasks
- THEN the system enables batch export button
- AND prompts for format selection
- AND generates combined export file
- AND shows progress bar for PDF generation (slower due to image processing)
- AND includes all extracted images when exporting Markdown or PDF