Files
OCR/openspec/changes/archive/2025-12-10-add-ocr-processing-presets/specs/ocr-processing/spec.md
egg 940a406dce chore: backup before code cleanup
Backup commit before executing remove-unused-code proposal.
This includes all pending changes and new features.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 11:55:39 +08:00

3.4 KiB

OCR Processing - Delta Spec

ADDED Requirements

Requirement: REQ-OCR-PRESETS - Document Type Presets

The system MUST provide predefined OCR processing configurations for common document types.

Available presets:

  • text_heavy: Optimized for text-heavy documents (reports, articles)
  • datasheet: Optimized for technical datasheets
  • table_heavy: Optimized for documents with many tables
  • form: Optimized for forms and applications
  • mixed: Balanced configuration for mixed content
  • custom: User-defined configuration

Scenario: User selects datasheet preset

  • Given a user uploading a technical datasheet
  • When they select the "datasheet" preset
  • Then the system applies conservative table parsing mode
  • And disables wireless table detection
  • And sets layout threshold to 0.65

Scenario: User selects text_heavy preset

  • Given a user uploading a text-heavy report
  • When they select the "text_heavy" preset
  • Then the system disables table recognition
  • And focuses on text extraction

Requirement: REQ-OCR-PARAMS - Advanced Parameter Configuration

The system MUST allow advanced users to configure individual PP-Structure parameters.

Configurable parameters include:

  • Table parsing mode (full/conservative/classification_only/disabled)
  • Table layout threshold (0.0-1.0)
  • Wired/wireless table detection toggles
  • Layout detection model selection
  • Preprocessing options (orientation, unwarping, textline)
  • Recognition module toggles (chart, formula, seal)

Scenario: User adjusts table layout threshold

  • Given a user experiencing table over-detection
  • When they increase table_layout_threshold to 0.7
  • Then fewer regions are classified as tables
  • And text regions are preserved correctly

Scenario: User disables wireless table detection

  • Given a user processing a datasheet with cell explosion
  • When they disable enable_wireless_table
  • Then only bordered tables are detected
  • And structured text is not split into cells

Requirement: REQ-OCR-API - OCR Configuration API

The task creation API MUST accept OCR configuration parameters.

API accepts:

  • ocr_preset: Preset name to apply
  • ocr_config: Custom configuration object (overrides preset)

Scenario: Create task with preset

  • Given an API request with ocr_preset="datasheet"
  • When the task is created
  • Then the datasheet preset configuration is applied
  • And the task processes with conservative table parsing

Scenario: Create task with custom config

  • Given an API request with ocr_config containing custom values
  • When the task is created
  • Then the custom configuration overrides defaults
  • And the task uses the specified parameters

MODIFIED Requirements

Requirement: REQ-OCR-DEFAULTS - Default Processing Configuration

The system default configuration MUST be conservative to prevent over-detection.

Default values:

  • table_parsing_mode: "conservative"
  • table_layout_threshold: 0.65
  • enable_wireless_table: false
  • use_doc_unwarping: false

Patch behaviors MUST be disabled by default:

  • cell_validation_enabled: false
  • gap_filling_enabled: false
  • table_content_rebuilder_enabled: false

Scenario: New task uses conservative defaults

  • Given a task created without specifying OCR configuration
  • When the task is processed
  • Then conservative table parsing is used
  • And wireless table detection is disabled
  • And no post-processing patches are applied