176 lines
7.1 KiB
Markdown
176 lines
7.1 KiB
Markdown
# Export Results Specification
|
|
|
|
## ADDED Requirements
|
|
|
|
### Requirement: Plain Text Export
|
|
The system SHALL export OCR results as plain text files with configurable formatting.
|
|
|
|
#### Scenario: Export single file result as TXT
|
|
- **WHEN** user selects a completed OCR task and chooses TXT export
|
|
- **THEN** the system generates a .txt file with extracted text
|
|
- **AND** preserves line breaks based on bounding box positions
|
|
- **AND** returns downloadable file
|
|
|
|
#### Scenario: Export batch results as TXT
|
|
- **WHEN** user exports a batch with 5 files as TXT
|
|
- **THEN** the system creates a ZIP file containing 5 .txt files
|
|
- **AND** names each file as `{original_filename}_ocr.txt`
|
|
- **AND** returns the ZIP for download
|
|
|
|
### Requirement: JSON Export
|
|
The system SHALL export OCR results as structured JSON with full metadata.
|
|
|
|
#### Scenario: Export with metadata
|
|
- **WHEN** user selects JSON export format
|
|
- **THEN** the system generates JSON containing:
|
|
- File information (name, size, format)
|
|
- OCR results array with text, bounding boxes, confidence
|
|
- Processing metadata (timestamp, language, model version)
|
|
- Task status and statistics
|
|
|
|
#### Scenario: JSON export example structure
|
|
- **WHEN** export is generated
|
|
- **THEN** JSON structure follows this format:
|
|
```json
|
|
{
|
|
"file_name": "document.png",
|
|
"file_size": 1024000,
|
|
"upload_time": "2025-01-01T10:00:00Z",
|
|
"processing_time": 2.5,
|
|
"language": "zh-TW",
|
|
"results": [
|
|
{
|
|
"text": "範例文字",
|
|
"bbox": [100, 50, 200, 80],
|
|
"confidence": 0.95
|
|
}
|
|
],
|
|
"status": "completed"
|
|
}
|
|
```
|
|
|
|
### Requirement: Excel Export
|
|
The system SHALL export OCR results as Excel spreadsheets with tabular format.
|
|
|
|
#### Scenario: Single file Excel export
|
|
- **WHEN** user selects Excel export for one file
|
|
- **THEN** the system generates .xlsx file with columns:
|
|
- Row Number
|
|
- Recognized Text
|
|
- Confidence Score
|
|
- Bounding Box (X, Y, Width, Height)
|
|
- Language
|
|
|
|
#### Scenario: Batch Excel export with multiple sheets
|
|
- **WHEN** user exports batch with 3 files as Excel
|
|
- **THEN** the system creates one .xlsx file with 3 sheets
|
|
- **AND** names each sheet as the original filename
|
|
- **AND** includes summary sheet with statistics
|
|
|
|
### Requirement: Rule-Based Output Formatting
|
|
The system SHALL apply user-defined rules to format exported text.
|
|
|
|
#### Scenario: Group by filename pattern
|
|
- **WHEN** user defines rule "group files with prefix 'invoice_'"
|
|
- **THEN** the system groups all matching files together
|
|
- **AND** exports them in a single combined file or folder
|
|
|
|
#### Scenario: Filter by confidence threshold
|
|
- **WHEN** user sets export rule "minimum confidence 0.8"
|
|
- **THEN** the system excludes text with confidence < 0.8 from export
|
|
- **AND** includes only high-confidence results
|
|
|
|
#### Scenario: Custom text formatting
|
|
- **WHEN** user defines rule "add line numbers"
|
|
- **THEN** the system prepends line numbers to each text line
|
|
- **AND** formats output as: `1. 第一行文字\n2. 第二行文字`
|
|
|
|
#### Scenario: Sort by reading order
|
|
- **WHEN** user enables "sort by position" rule
|
|
- **THEN** the system orders text by vertical position (top to bottom)
|
|
- **AND** then by horizontal position (left to right) within each row
|
|
- **AND** exports text in natural reading order
|
|
|
|
### Requirement: Export Rule Configuration
|
|
The system SHALL allow users to save and reuse export rules.
|
|
|
|
#### Scenario: Save custom export rule
|
|
- **WHEN** user creates a rule with name "高品質發票輸出"
|
|
- **THEN** the system saves the rule to database
|
|
- **AND** associates it with the user account
|
|
- **AND** makes it available in rule selection dropdown
|
|
|
|
#### Scenario: Apply saved rule
|
|
- **WHEN** user selects a saved rule for export
|
|
- **THEN** the system applies all configured filters and formatting
|
|
- **AND** generates output according to rule settings
|
|
|
|
#### Scenario: Edit existing rule
|
|
- **WHEN** user modifies a saved rule
|
|
- **THEN** the system updates the rule configuration
|
|
- **AND** preserves the rule ID for continuity
|
|
|
|
### Requirement: Markdown Export with Structure and Images
|
|
The system SHALL export OCR results as Markdown files preserving document logical structure with accompanying images.
|
|
|
|
#### Scenario: Export as Markdown with structure and images
|
|
- **WHEN** user selects Markdown export format
|
|
- **THEN** the system generates .md file with logical structure
|
|
- **AND** includes headings, paragraphs, tables, lists in proper hierarchy
|
|
- **AND** embeds image references pointing to extracted images ()
|
|
- **AND** maintains reading order from OCR analysis
|
|
- **AND** includes extracted images in an images/ folder
|
|
|
|
#### Scenario: Batch Markdown export with images
|
|
- **WHEN** user exports batch with 5 files as Markdown
|
|
- **THEN** the system creates 5 separate .md files
|
|
- **AND** creates corresponding images/ folders for each document
|
|
- **AND** optionally creates combined .md with page separators
|
|
- **AND** returns ZIP file containing all Markdown files and images
|
|
|
|
### Requirement: Searchable PDF Export with Images
|
|
The system SHALL generate searchable PDF files that include extracted text and images, preserving logical document structure (not exact visual layout).
|
|
|
|
#### Scenario: Single document PDF export with images
|
|
- **WHEN** user requests PDF export from OCR result
|
|
- **THEN** the system converts Markdown to HTML with basic CSS styling
|
|
- **AND** embeds extracted images from images/ folder
|
|
- **AND** generates PDF using Pandoc + WeasyPrint
|
|
- **AND** preserves document hierarchy, tables, and reading order
|
|
- **AND** images appear near their logical position in text flow
|
|
- **AND** uses appropriate Chinese font (Noto Sans CJK)
|
|
- **AND** produces searchable PDF with selectable text
|
|
|
|
#### Scenario: Basic PDF formatting options
|
|
- **WHEN** user selects PDF export
|
|
- **THEN** the system applies basic readable formatting
|
|
- **AND** sets standard margins and page size (A4)
|
|
- **AND** uses consistent fonts and spacing
|
|
- **AND** ensures images fit within page width
|
|
- **NOTE** CSS templates are for basic readability, not for replicating original visual design
|
|
|
|
#### Scenario: Batch PDF export with images
|
|
- **WHEN** user exports batch as PDF
|
|
- **THEN** the system generates individual PDF for each document with embedded images
|
|
- **OR** creates single merged PDF with page breaks
|
|
- **AND** maintains consistent formatting across all pages
|
|
- **AND** returns ZIP of PDFs or single merged PDF
|
|
|
|
### Requirement: Export Format Selection
|
|
The system SHALL provide UI for selecting export format and options.
|
|
|
|
#### Scenario: Format selection with preview
|
|
- **WHEN** user opens export dialog
|
|
- **THEN** the system displays format options (TXT, JSON, Excel, **Markdown with images, Searchable PDF**)
|
|
- **AND** shows preview of output structure for selected format
|
|
- **AND** allows applying custom rules for text filtering
|
|
- **AND** provides basic formatting option for PDF (standard readable format)
|
|
|
|
#### Scenario: Batch export with format choice
|
|
- **WHEN** user selects multiple completed tasks
|
|
- **THEN** the system enables batch export button
|
|
- **AND** prompts for format selection
|
|
- **AND** generates combined export file
|
|
- **AND** shows progress bar for PDF generation (slower due to image processing)
|
|
- **AND** includes all extracted images when exporting Markdown or PDF
|