Files
OCR/openspec/changes/archive/2025-11-27-add-ocr-track-gap-filling/specs/ocr-processing/spec.md
egg 59206a6ab8 feat: simplify layout model selection and archive proposals
Changes:
- Replace PP-Structure 7-slider parameter UI with simple 3-option layout model selector
- Add layout model mapping: chinese (PP-DocLayout-S), default (PubLayNet), cdla
- Add LayoutModelSelector component and zh-TW translations
- Fix "default" model behavior with sentinel value for PubLayNet
- Add gap filling service for OCR track coverage improvement
- Add PP-Structure debug utilities
- Archive completed/incomplete proposals:
  - add-ocr-track-gap-filling (complete)
  - fix-ocr-track-table-rendering (incomplete)
  - simplify-ppstructure-model-selection (22/25 tasks)
- Add new layout model tests, archive old PP-Structure param tests
- Update OpenSpec ocr-processing spec with layout model requirements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 13:27:00 +08:00

5.2 KiB

ADDED Requirements

Requirement: OCR Track Gap Filling with Raw OCR Regions

The system SHALL detect and fill gaps in PP-StructureV3 output by supplementing with Raw OCR text regions when significant content loss is detected.

Scenario: Gap filling activates when coverage is low

  • GIVEN an OCR track processing task
  • WHEN PP-StructureV3 outputs elements that cover less than 70% of Raw OCR text regions
  • THEN the system SHALL activate gap filling
  • AND identify Raw OCR regions not covered by any PP-StructureV3 element
  • AND supplement these regions as TEXT elements in the output

Scenario: Coverage is determined by center-point and IoU

  • GIVEN a Raw OCR text region with bounding box
  • WHEN checking if the region is covered by PP-StructureV3
  • THEN the region SHALL be considered covered if its center point falls inside any PP-StructureV3 element bbox
  • OR if IoU with any PP-StructureV3 element exceeds 0.15 threshold
  • AND regions not meeting either criterion SHALL be marked as uncovered

Scenario: Only TEXT elements are supplemented

  • GIVEN uncovered Raw OCR regions identified for supplementation
  • WHEN PP-StructureV3 has detected TABLE, IMAGE, FIGURE, FLOWCHART, HEADER, or FOOTER elements
  • THEN the system SHALL NOT supplement regions that overlap with these structural elements
  • AND only supplement regions as TEXT type to preserve structural integrity

Scenario: Supplemented regions meet confidence threshold

  • GIVEN Raw OCR regions to be supplemented
  • WHEN a region has confidence score below 0.3
  • THEN the system SHALL skip that region
  • AND only supplement regions with confidence >= 0.3

Scenario: Deduplication prevents repeated text

  • GIVEN a Raw OCR region being considered for supplementation
  • WHEN the region has IoU > 0.5 with any existing PP-StructureV3 TEXT element
  • THEN the system SHALL skip that region to prevent duplicate text
  • AND the original PP-StructureV3 element SHALL be preserved

Scenario: Reading order is recalculated after gap filling

  • GIVEN supplemented elements have been added to the page
  • WHEN assembling the final element list
  • THEN the system SHALL recalculate reading order for the entire page
  • AND sort elements by y0 coordinate (top to bottom) then x0 (left to right)
  • AND ensure logical document flow is maintained

Scenario: Coordinate alignment with ocr_dimensions

  • GIVEN Raw OCR processing may involve image resizing
  • WHEN comparing Raw OCR bbox with PP-StructureV3 bbox
  • THEN the system SHALL use ocr_dimensions to normalize coordinates
  • AND ensure both sources reference the same coordinate space
  • AND prevent coverage misdetection due to scale differences

Scenario: Supplemented elements have complete metadata

  • GIVEN a Raw OCR region being added as supplemented element
  • WHEN creating the DocumentElement
  • THEN the element SHALL include page_number
  • AND include confidence score from Raw OCR
  • AND include original bbox coordinates
  • AND optionally include source indicator for debugging

Requirement: Gap Filling Track Isolation

The gap filling feature SHALL only apply to OCR track processing and SHALL NOT affect Direct or Hybrid track outputs.

Scenario: Gap filling only activates for OCR track

  • GIVEN a document processing task
  • WHEN the processing track is OCR
  • THEN the system SHALL evaluate and apply gap filling as needed
  • AND produce enhanced output with supplemented content

Scenario: Direct track is unaffected

  • GIVEN a document processing task with Direct track
  • WHEN the task is processed
  • THEN the system SHALL NOT invoke any gap filling logic
  • AND produce output identical to current Direct track behavior

Scenario: Hybrid track is unaffected

  • GIVEN a document processing task with Hybrid track
  • WHEN the task is processed
  • THEN the system SHALL NOT invoke gap filling logic
  • AND use existing Hybrid track processing pipeline

Requirement: Gap Filling Configuration

The system SHALL provide configurable parameters for gap filling behavior.

Scenario: Gap filling can be disabled via configuration

  • GIVEN gap_filling_enabled is set to false in configuration
  • WHEN OCR track processing runs
  • THEN the system SHALL skip all gap filling logic
  • AND output only PP-StructureV3 results as before

Scenario: Coverage threshold is configurable

  • GIVEN gap_filling_coverage_threshold is set to 0.8
  • WHEN PP-StructureV3 coverage is 75%
  • THEN the system SHALL activate gap filling
  • AND supplement uncovered regions

Scenario: IoU thresholds are configurable

  • GIVEN custom IoU thresholds configured:
    • gap_filling_iou_threshold: 0.2
    • gap_filling_dedup_iou_threshold: 0.6
  • WHEN evaluating coverage and deduplication
  • THEN the system SHALL use the configured values
  • AND apply them consistently throughout gap filling process

Scenario: Confidence threshold is configurable

  • GIVEN gap_filling_confidence_threshold is set to 0.5
  • WHEN supplementing Raw OCR regions
  • THEN the system SHALL only include regions with confidence >= 0.5
  • AND filter out lower confidence regions