first
This commit is contained in:
@@ -0,0 +1,125 @@
|
||||
# OCR Processing Specification
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Multi-Language Text Recognition with Structure Analysis
|
||||
The system SHALL extract text and images from document files using PaddleOCR-VL with support for 109 languages including Chinese (traditional and simplified), English, Japanese, and Korean, while preserving document logical structure and reading order (not pixel-perfect visual layout).
|
||||
|
||||
#### Scenario: Single image OCR with Chinese text
|
||||
- **WHEN** user uploads a PNG image containing Chinese text
|
||||
- **THEN** the system extracts text with bounding boxes and confidence scores
|
||||
- **AND** returns structured JSON with recognized text, coordinates, and language detected
|
||||
- **AND** generates Markdown output preserving text layout and hierarchy
|
||||
|
||||
#### Scenario: PDF document OCR with layout preservation
|
||||
- **WHEN** user uploads a multi-page PDF file
|
||||
- **THEN** the system processes each page with PaddleOCR-VL
|
||||
- **AND** performs layout analysis to identify document elements (titles, paragraphs, tables, images, formulas)
|
||||
- **AND** returns Markdown organized by page with preserved reading order
|
||||
- **AND** provides JSON with detailed layout structure and bounding boxes
|
||||
|
||||
#### Scenario: Mixed language content
|
||||
- **WHEN** user uploads an image with both Chinese and English text
|
||||
- **THEN** the system detects and extracts text in both languages
|
||||
- **AND** preserves the spatial relationship between text regions
|
||||
- **AND** maintains proper reading order in output Markdown
|
||||
|
||||
#### Scenario: Complex document with tables and images
|
||||
- **WHEN** user uploads a scanned document containing tables, images, and text
|
||||
- **THEN** the system identifies layout elements (text blocks, tables, images, formulas)
|
||||
- **AND** extracts table structure as Markdown tables
|
||||
- **AND** extracts and saves document images as separate files
|
||||
- **AND** embeds image references in Markdown ()
|
||||
- **AND** preserves document hierarchy and reading order in Markdown output
|
||||
|
||||
### Requirement: Batch Processing
|
||||
The system SHALL process multiple files concurrently with progress tracking and error handling.
|
||||
|
||||
#### Scenario: Batch upload success
|
||||
- **WHEN** user uploads 10 image files simultaneously
|
||||
- **THEN** the system creates a batch task with unique batch ID
|
||||
- **AND** processes files in parallel (up to configured worker limit)
|
||||
- **AND** returns real-time progress updates via WebSocket or polling
|
||||
|
||||
#### Scenario: Batch processing with partial failure
|
||||
- **WHEN** a batch contains 5 valid images and 2 corrupted files
|
||||
- **THEN** the system processes all valid files successfully
|
||||
- **AND** logs errors for corrupted files with specific error messages
|
||||
- **AND** marks the batch as "partially completed"
|
||||
|
||||
### Requirement: Image Preprocessing
|
||||
The system SHALL provide optional image preprocessing to improve OCR accuracy.
|
||||
|
||||
#### Scenario: Low contrast image enhancement
|
||||
- **WHEN** user enables preprocessing for a low-contrast image
|
||||
- **THEN** the system applies contrast adjustment and denoising
|
||||
- **AND** performs OCR on the enhanced image
|
||||
- **AND** returns better accuracy compared to original
|
||||
|
||||
#### Scenario: Skipped preprocessing
|
||||
- **WHEN** user disables preprocessing option
|
||||
- **THEN** the system performs OCR directly on original image
|
||||
- **AND** completes processing faster
|
||||
|
||||
### Requirement: Confidence Threshold Filtering
|
||||
The system SHALL filter OCR results based on configurable confidence threshold.
|
||||
|
||||
#### Scenario: High confidence filter
|
||||
- **WHEN** user sets confidence threshold to 0.8
|
||||
- **THEN** the system returns only text segments with confidence >= 0.8
|
||||
- **AND** discards low-confidence results
|
||||
|
||||
#### Scenario: Include all results
|
||||
- **WHEN** user sets confidence threshold to 0.0
|
||||
- **THEN** the system returns all recognized text regardless of confidence
|
||||
- **AND** includes confidence scores in output
|
||||
|
||||
### Requirement: OCR Result Structure
|
||||
The system SHALL return OCR results in multiple formats (JSON, Markdown) with extracted text, images, and structure metadata.
|
||||
|
||||
#### Scenario: Successful OCR result with multiple formats
|
||||
- **WHEN** OCR processing completes successfully
|
||||
- **THEN** the system returns JSON containing:
|
||||
- File metadata (name, size, format, upload timestamp)
|
||||
- Detected text regions with bounding boxes (x, y, width, height)
|
||||
- Recognized text content for each region
|
||||
- Confidence scores (0.0 to 1.0)
|
||||
- Language detected
|
||||
- Layout element types (title, paragraph, table, image, formula)
|
||||
- Reading order sequence
|
||||
- List of extracted image files with paths
|
||||
- Processing time
|
||||
- Task status (completed/failed/partial)
|
||||
- **AND** generates Markdown file with logical structure
|
||||
- **AND** saves extracted images to storage directory
|
||||
- **AND** provides methods to export as searchable PDF with images
|
||||
|
||||
#### Scenario: Searchable PDF generation with images
|
||||
- **WHEN** user requests PDF export from OCR results
|
||||
- **THEN** the system converts Markdown to HTML with basic CSS styling
|
||||
- **AND** embeds extracted images in their logical positions (not exact original positions)
|
||||
- **AND** generates PDF using Pandoc + WeasyPrint
|
||||
- **AND** preserves document hierarchy, tables, and reading order
|
||||
- **AND** applies appropriate fonts for Chinese characters
|
||||
- **AND** produces searchable PDF (text is selectable and searchable)
|
||||
|
||||
### Requirement: Document Translation (Reserved Architecture)
|
||||
The system SHALL provide architecture and UI placeholders for future document translation features.
|
||||
|
||||
#### Scenario: Translation option visibility (UI placeholder)
|
||||
- **WHEN** user views OCR result page
|
||||
- **THEN** the system displays a "Translate Document" button (disabled or labeled "Coming Soon")
|
||||
- **AND** shows target language selection dropdown (disabled)
|
||||
- **AND** provides tooltip: "Translation feature will be available in future release"
|
||||
|
||||
#### Scenario: Translation API endpoint (reserved)
|
||||
- **WHEN** backend API is queried for translation endpoints
|
||||
- **THEN** the system provides `/api/v1/translate/document` endpoint specification
|
||||
- **AND** returns "Not Implemented" (501) status when called
|
||||
- **AND** documents expected request/response format for future implementation
|
||||
|
||||
#### Scenario: Translation configuration storage (database schema)
|
||||
- **WHEN** database schema is created
|
||||
- **THEN** the system includes `translation_configs` table
|
||||
- **AND** defines columns: id, user_id, source_lang, target_lang, engine_type, engine_config, created_at
|
||||
- **AND** table remains empty until translation feature is implemented
|
||||
Reference in New Issue
Block a user