# OCR Processing - Delta Spec ## ADDED Requirements ### Requirement: REQ-OCR-PRESETS - Document Type Presets The system MUST provide predefined OCR processing configurations for common document types. Available presets: - `text_heavy`: Optimized for text-heavy documents (reports, articles) - `datasheet`: Optimized for technical datasheets - `table_heavy`: Optimized for documents with many tables - `form`: Optimized for forms and applications - `mixed`: Balanced configuration for mixed content - `custom`: User-defined configuration #### Scenario: User selects datasheet preset - Given a user uploading a technical datasheet - When they select the "datasheet" preset - Then the system applies conservative table parsing mode - And disables wireless table detection - And sets layout threshold to 0.65 #### Scenario: User selects text_heavy preset - Given a user uploading a text-heavy report - When they select the "text_heavy" preset - Then the system disables table recognition - And focuses on text extraction ### Requirement: REQ-OCR-PARAMS - Advanced Parameter Configuration The system MUST allow advanced users to configure individual PP-Structure parameters. Configurable parameters include: - Table parsing mode (full/conservative/classification_only/disabled) - Table layout threshold (0.0-1.0) - Wired/wireless table detection toggles - Layout detection model selection - Preprocessing options (orientation, unwarping, textline) - Recognition module toggles (chart, formula, seal) #### Scenario: User adjusts table layout threshold - Given a user experiencing table over-detection - When they increase table_layout_threshold to 0.7 - Then fewer regions are classified as tables - And text regions are preserved correctly #### Scenario: User disables wireless table detection - Given a user processing a datasheet with cell explosion - When they disable enable_wireless_table - Then only bordered tables are detected - And structured text is not split into cells ### Requirement: REQ-OCR-API - OCR Configuration API The task creation API MUST accept OCR configuration parameters. API accepts: - `ocr_preset`: Preset name to apply - `ocr_config`: Custom configuration object (overrides preset) #### Scenario: Create task with preset - Given an API request with ocr_preset="datasheet" - When the task is created - Then the datasheet preset configuration is applied - And the task processes with conservative table parsing #### Scenario: Create task with custom config - Given an API request with ocr_config containing custom values - When the task is created - Then the custom configuration overrides defaults - And the task uses the specified parameters ## MODIFIED Requirements ### Requirement: REQ-OCR-DEFAULTS - Default Processing Configuration The system default configuration MUST be conservative to prevent over-detection. Default values: - `table_parsing_mode`: "conservative" - `table_layout_threshold`: 0.65 - `enable_wireless_table`: false - `use_doc_unwarping`: false Patch behaviors MUST be disabled by default: - `cell_validation_enabled`: false - `gap_filling_enabled`: false - `table_content_rebuilder_enabled`: false #### Scenario: New task uses conservative defaults - Given a task created without specifying OCR configuration - When the task is processed - Then conservative table parsing is used - And wireless table detection is disabled - And no post-processing patches are applied