# Export Results Specification ## ADDED Requirements ### Requirement: Plain Text Export The system SHALL export OCR results as plain text files with configurable formatting. #### Scenario: Export single file result as TXT - **WHEN** user selects a completed OCR task and chooses TXT export - **THEN** the system generates a .txt file with extracted text - **AND** preserves line breaks based on bounding box positions - **AND** returns downloadable file #### Scenario: Export batch results as TXT - **WHEN** user exports a batch with 5 files as TXT - **THEN** the system creates a ZIP file containing 5 .txt files - **AND** names each file as `{original_filename}_ocr.txt` - **AND** returns the ZIP for download ### Requirement: JSON Export The system SHALL export OCR results as structured JSON with full metadata. #### Scenario: Export with metadata - **WHEN** user selects JSON export format - **THEN** the system generates JSON containing: - File information (name, size, format) - OCR results array with text, bounding boxes, confidence - Processing metadata (timestamp, language, model version) - Task status and statistics #### Scenario: JSON export example structure - **WHEN** export is generated - **THEN** JSON structure follows this format: ```json { "file_name": "document.png", "file_size": 1024000, "upload_time": "2025-01-01T10:00:00Z", "processing_time": 2.5, "language": "zh-TW", "results": [ { "text": "範例文字", "bbox": [100, 50, 200, 80], "confidence": 0.95 } ], "status": "completed" } ``` ### Requirement: Excel Export The system SHALL export OCR results as Excel spreadsheets with tabular format. #### Scenario: Single file Excel export - **WHEN** user selects Excel export for one file - **THEN** the system generates .xlsx file with columns: - Row Number - Recognized Text - Confidence Score - Bounding Box (X, Y, Width, Height) - Language #### Scenario: Batch Excel export with multiple sheets - **WHEN** user exports batch with 3 files as Excel - **THEN** the system creates one .xlsx file with 3 sheets - **AND** names each sheet as the original filename - **AND** includes summary sheet with statistics ### Requirement: Rule-Based Output Formatting The system SHALL apply user-defined rules to format exported text. #### Scenario: Group by filename pattern - **WHEN** user defines rule "group files with prefix 'invoice_'" - **THEN** the system groups all matching files together - **AND** exports them in a single combined file or folder #### Scenario: Filter by confidence threshold - **WHEN** user sets export rule "minimum confidence 0.8" - **THEN** the system excludes text with confidence < 0.8 from export - **AND** includes only high-confidence results #### Scenario: Custom text formatting - **WHEN** user defines rule "add line numbers" - **THEN** the system prepends line numbers to each text line - **AND** formats output as: `1. 第一行文字\n2. 第二行文字` #### Scenario: Sort by reading order - **WHEN** user enables "sort by position" rule - **THEN** the system orders text by vertical position (top to bottom) - **AND** then by horizontal position (left to right) within each row - **AND** exports text in natural reading order ### Requirement: Export Rule Configuration The system SHALL allow users to save and reuse export rules. #### Scenario: Save custom export rule - **WHEN** user creates a rule with name "高品質發票輸出" - **THEN** the system saves the rule to database - **AND** associates it with the user account - **AND** makes it available in rule selection dropdown #### Scenario: Apply saved rule - **WHEN** user selects a saved rule for export - **THEN** the system applies all configured filters and formatting - **AND** generates output according to rule settings #### Scenario: Edit existing rule - **WHEN** user modifies a saved rule - **THEN** the system updates the rule configuration - **AND** preserves the rule ID for continuity ### Requirement: Markdown Export with Structure and Images The system SHALL export OCR results as Markdown files preserving document logical structure with accompanying images. #### Scenario: Export as Markdown with structure and images - **WHEN** user selects Markdown export format - **THEN** the system generates .md file with logical structure - **AND** includes headings, paragraphs, tables, lists in proper hierarchy - **AND** embeds image references pointing to extracted images (![](./images/img1.jpg)) - **AND** maintains reading order from OCR analysis - **AND** includes extracted images in an images/ folder #### Scenario: Batch Markdown export with images - **WHEN** user exports batch with 5 files as Markdown - **THEN** the system creates 5 separate .md files - **AND** creates corresponding images/ folders for each document - **AND** optionally creates combined .md with page separators - **AND** returns ZIP file containing all Markdown files and images ### Requirement: Searchable PDF Export with Images The system SHALL generate searchable PDF files that include extracted text and images, preserving logical document structure (not exact visual layout). #### Scenario: Single document PDF export with images - **WHEN** user requests PDF export from OCR result - **THEN** the system converts Markdown to HTML with basic CSS styling - **AND** embeds extracted images from images/ folder - **AND** generates PDF using Pandoc + WeasyPrint - **AND** preserves document hierarchy, tables, and reading order - **AND** images appear near their logical position in text flow - **AND** uses appropriate Chinese font (Noto Sans CJK) - **AND** produces searchable PDF with selectable text #### Scenario: Basic PDF formatting options - **WHEN** user selects PDF export - **THEN** the system applies basic readable formatting - **AND** sets standard margins and page size (A4) - **AND** uses consistent fonts and spacing - **AND** ensures images fit within page width - **NOTE** CSS templates are for basic readability, not for replicating original visual design #### Scenario: Batch PDF export with images - **WHEN** user exports batch as PDF - **THEN** the system generates individual PDF for each document with embedded images - **OR** creates single merged PDF with page breaks - **AND** maintains consistent formatting across all pages - **AND** returns ZIP of PDFs or single merged PDF ### Requirement: Export Format Selection The system SHALL provide UI for selecting export format and options. #### Scenario: Format selection with preview - **WHEN** user opens export dialog - **THEN** the system displays format options (TXT, JSON, Excel, **Markdown with images, Searchable PDF**) - **AND** shows preview of output structure for selected format - **AND** allows applying custom rules for text filtering - **AND** provides basic formatting option for PDF (standard readable format) #### Scenario: Batch export with format choice - **WHEN** user selects multiple completed tasks - **THEN** the system enables batch export button - **AND** prompts for format selection - **AND** generates combined export file - **AND** shows progress bar for PDF generation (slower due to image processing) - **AND** includes all extracted images when exporting Markdown or PDF