Files
OCR/openspec/changes/archive/2025-12-11-simplify-frontend-ocr-config/specs/ocr-processing/spec.md
egg 53bfa88773 docs: archive simplify-frontend-ocr-config proposal
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:17:07 +08:00

6.0 KiB

ocr-processing Specification Delta

REMOVED Requirements

Requirement: OCR Preset Selection

Reason: OCR track 已改為 simple OCR 模式,不再需要前端提供複雜的預設配置。後端統一使用預設參數處理。 Migration: 移除前端 OCRPresetSelector 組件及相關類型定義。後端自動使用最佳預設配置。

Requirement: Table Detection Configuration

Reason: 表格偵測設定(有框線/無框線表格開關、區域偵測開關)不再需要由前端控制。後端統一使用預設的表格偵測策略。 Migration: 移除前端 TableDetectionSelector 組件及 TableDetectionConfig 類型。後端使用內建預設值。

Requirement: OCR Advanced Parameters

Reason: 進階 OCR 參數(如 table_parsing_mode, layout_threshold, enable_chart_recognition 等)不再需要前端配置。 Migration: 移除前端 OCRConfig 類型及相關 UI。後端固定使用 simple OCR 模式的預設參數。

MODIFIED Requirements

Requirement: Layout Model Selection

The system SHALL allow users to select a layout detection model optimized for their document type, providing a simple choice between pre-configured models instead of manual parameter tuning.

Scenario: User selects Chinese document model

  • GIVEN a user is processing Chinese business documents (forms, contracts, invoices)
  • WHEN the user selects "Chinese Document Model" (PP-DocLayout-S)
  • THEN the OCR engine SHALL use the PP-DocLayout-S layout detection model
  • AND the model SHALL be optimized for 23 Chinese document element types
  • AND table and form detection accuracy SHALL be improved over the default model

Scenario: User selects standard model for English documents

  • GIVEN a user is processing English academic papers or reports
  • WHEN the user selects "Standard Model" (PubLayNet-based)
  • THEN the OCR engine SHALL use the default PubLayNet-based layout detection model
  • AND the model SHALL be optimized for English document layouts

Scenario: User selects CDLA model for specialized Chinese layout

  • GIVEN a user is processing Chinese documents with complex layouts
  • WHEN the user selects "CDLA Model"
  • THEN the OCR engine SHALL use the picodet_lcnet_x1_0_fgd_layout_cdla model
  • AND the model SHALL provide specialized Chinese document layout analysis

Scenario: Layout model is sent via API request

  • GIVEN a frontend application with model selection UI
  • WHEN the user starts task processing with a selected model
  • THEN the frontend SHALL send the model choice in the request body:
    POST /api/v2/tasks/{task_id}/start
    {
      "use_dual_track": true,
      "force_track": "ocr",
      "language": "ch",
      "layout_model": "chinese"
    }
    
  • AND the backend SHALL configure PP-StructureV3 with the corresponding model
  • AND the frontend SHALL NOT send ocr_preset, ocr_config, or table_detection parameters

Scenario: Default model when not specified

  • GIVEN an API request without layout_model parameter
  • WHEN the task is started
  • THEN the system SHALL use "chinese" (PP-DocLayout-S) as the default model
  • AND processing SHALL work correctly without requiring model selection

Scenario: Invalid model name is rejected

  • GIVEN a request with an invalid layout_model value
  • WHEN the user sends layout_model: "invalid_model"
  • THEN the API SHALL return 422 Validation Error
  • AND provide a clear error message listing valid model options

Requirement: Layout Model Selection UI

The frontend SHALL provide a simple, user-friendly interface for selecting layout detection models with clear descriptions of each option.

Scenario: Model options are displayed with descriptions

  • GIVEN the model selection UI is displayed
  • WHEN the user views the available options
  • THEN the UI SHALL show the following options:
    • "Chinese Document Model (Recommended)" - for Chinese forms, contracts, invoices
    • "Standard Model" - for English academic papers, reports
    • "CDLA Model" - for specialized Chinese layout analysis
  • AND each option SHALL have a brief description of its use case

Scenario: Chinese model is selected by default

  • GIVEN the user opens the task processing interface
  • WHEN the model selection is displayed
  • THEN "Chinese Document Model" SHALL be pre-selected as the default
  • AND the user MAY change the selection before starting processing

Scenario: Model selection is visible only for OCR track

  • GIVEN a document processing interface
  • WHEN the user selects processing track
  • THEN layout model selection SHALL be shown ONLY when OCR track is selected or auto-detected
  • AND SHALL be hidden for Direct track (which does not use PP-StructureV3)

Scenario: Simplified configuration options

  • GIVEN the OCR track processing interface
  • WHEN the user configures processing options
  • THEN the UI SHALL only show:
    • Layout model selection (chinese/default/cdla)
    • Image preprocessing settings (auto/manual/disabled)
  • AND SHALL NOT show:
    • OCR preset selection
    • Table detection configuration
    • Advanced OCR parameters

Requirement: Simplified Processing Options API

The backend API SHALL accept a simplified ProcessingOptions schema without complex OCR configuration parameters.

Scenario: API accepts minimal configuration

  • GIVEN a start task API request
  • WHEN the request body contains:
    {
      "use_dual_track": true,
      "force_track": "ocr",
      "language": "ch",
      "layout_model": "chinese",
      "preprocessing_mode": "auto"
    }
    
  • THEN the API SHALL accept the request
  • AND process the task using backend default values for all other parameters

Scenario: Legacy parameters are ignored

  • GIVEN a start task API request with legacy parameters
  • WHEN the request contains ocr_preset, ocr_config, or table_detection
  • THEN the API SHALL ignore these parameters
  • AND use backend default values instead
  • AND NOT return an error (backward compatibility)