Files
OCR/openspec/changes/add-layout-preprocessing/specs/ocr-processing/spec.md
egg 06a5973f2e proposal: add hybrid control mode with auto-detection and preview
Updates add-layout-preprocessing proposal:
- Auto mode: analyze image quality, auto-select parameters
- Manual mode: user override with specific settings
- Preview API: compare original vs preprocessed before processing
- Frontend UI: mode selection, manual controls, preview button

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 14:31:09 +08:00

5.8 KiB

ADDED Requirements

Requirement: Layout Detection Image Preprocessing

The system SHALL provide optional image preprocessing to enhance layout detection accuracy for documents with faint lines, low contrast, or poor scan quality.

Scenario: Preprocessing improves table detection

  • GIVEN a document with faint table borders that PP-Structure fails to detect
  • WHEN layout preprocessing is enabled
  • THEN the system SHALL preprocess the image before layout detection
  • AND contrast enhancement SHALL make faint lines more visible
  • AND PP-Structure SHALL receive the preprocessed image for layout detection

Scenario: Image element extraction uses original quality

  • GIVEN an image element detected by PP-Structure from preprocessed input
  • WHEN the system extracts the image element
  • THEN the system SHALL crop from the ORIGINAL image, not the preprocessed version
  • AND the extracted image SHALL maintain original quality and colors

Scenario: CLAHE contrast enhancement

  • WHEN layout_preprocessing_contrast is set to "clahe"
  • THEN the system SHALL apply Contrast Limited Adaptive Histogram Equalization
  • AND the enhancement SHALL not over-saturate already bright regions

Scenario: Sharpening enhances faint lines

  • WHEN layout_preprocessing_sharpen is enabled
  • THEN the system SHALL apply unsharp masking to enhance edges
  • AND faint table borders SHALL become more detectable

Scenario: Optional binarization for extreme cases

  • WHEN layout_preprocessing_binarize is enabled
  • THEN the system SHALL apply adaptive thresholding
  • AND this SHALL be used only for documents with very poor contrast

Requirement: Preprocessing Hybrid Control Mode

The system SHALL support three preprocessing modes: automatic, manual, and disabled, with automatic as the default.

Scenario: Auto mode analyzes image quality

  • GIVEN preprocessing mode is set to "auto"
  • WHEN processing begins for a page
  • THEN the system SHALL analyze image quality metrics (contrast, edge strength)
  • AND automatically determine optimal preprocessing parameters
  • AND apply recommended settings without user intervention

Scenario: Auto mode detects low contrast

  • GIVEN preprocessing mode is "auto"
  • WHEN image contrast (standard deviation) is below 40
  • THEN the system SHALL automatically enable CLAHE contrast enhancement

Scenario: Auto mode detects faint edges

  • GIVEN preprocessing mode is "auto"
  • WHEN image edge strength (Sobel gradient mean) is below 15
  • THEN the system SHALL automatically enable sharpening

Scenario: Manual mode uses user-specified settings

  • GIVEN preprocessing mode is set to "manual"
  • WHEN processing begins
  • THEN the system SHALL use the user-provided preprocessing configuration
  • AND ignore automatic quality analysis

Scenario: Disabled mode skips preprocessing

  • GIVEN preprocessing mode is set to "disabled"
  • WHEN processing begins
  • THEN the system SHALL skip all preprocessing
  • AND PP-Structure SHALL receive the original image directly

Requirement: Preprocessing Preview API

The system SHALL provide a preview endpoint that allows users to compare original and preprocessed images before processing.

Scenario: Preview returns comparison images

  • GIVEN a task with uploaded document
  • WHEN user requests preprocessing preview for a specific page
  • THEN the system SHALL return URLs or data for both original and preprocessed images
  • AND user can visually compare the difference

Scenario: Preview shows auto-detected settings

  • GIVEN preview is requested with mode "auto"
  • WHEN the system analyzes the page
  • THEN the response SHALL include the auto-detected preprocessing configuration
  • AND include quality metrics (contrast, edge_strength)

Scenario: Preview accepts manual configuration

  • GIVEN preview is requested with mode "manual"
  • WHEN user provides specific preprocessing settings
  • THEN the system SHALL apply those settings to generate preview
  • AND return the preprocessed result for user verification

Requirement: Preprocessing Track Isolation

The layout preprocessing feature SHALL only affect layout detection input without impacting other processing components.

Scenario: Raw OCR is unaffected

  • GIVEN layout preprocessing is enabled
  • WHEN Raw OCR processing runs
  • THEN Raw OCR SHALL use the original image
  • AND text detection quality SHALL not be affected by preprocessing

Scenario: Preprocessed image is temporary

  • GIVEN an image is preprocessed for layout detection
  • WHEN layout detection completes
  • THEN the preprocessed image SHALL NOT be persisted to storage
  • AND only the original image and element crops SHALL be saved

Requirement: Preprocessing Frontend UI

The frontend SHALL provide a user interface for configuring and previewing preprocessing settings.

Scenario: Mode selection is available

  • GIVEN the user is configuring OCR track processing
  • WHEN the preprocessing settings panel is displayed
  • THEN the user SHALL be able to select mode: Auto (default), Manual, or Disabled
  • AND Auto mode SHALL be pre-selected

Scenario: Manual mode shows configuration options

  • GIVEN the user selects Manual mode
  • WHEN the settings panel updates
  • THEN the user SHALL see options for:
    • Contrast enhancement (None / Histogram / CLAHE)
    • Sharpen toggle
    • Binarize toggle

Scenario: Preview button triggers comparison view

  • GIVEN preprocessing settings are configured
  • WHEN the user clicks Preview button
  • THEN the system SHALL display side-by-side comparison of original and preprocessed images
  • AND show detected quality metrics