Files
OCR/openspec/changes/archive/2025-12-11-improve-ocr-track-algorithm/tasks.md
egg cfe65158a3 feat: enable document orientation detection for scanned PDFs
- Enable PP-StructureV3's use_doc_orientation_classify feature
- Detect rotation angle from doc_preprocessor_res.angle
- Swap page dimensions (width <-> height) for 90°/270° rotations
- Output PDF now correctly displays landscape-scanned content

Also includes:
- Archive completed openspec proposals
- Add simplify-frontend-ocr-config proposal (pending)
- Code cleanup and frontend simplification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:13:46 +08:00

2.2 KiB

1. Algorithm Changes (gap_filling_service.py)

1.1 IoA Implementation

  • 1.1.1 Add _calculate_ioa() method alongside existing _calculate_iou()
  • 1.1.2 Modify _is_region_covered() to use IoA instead of IoU
  • 1.1.3 Update deduplication logic to use IoA

1.2 Dynamic Threshold Strategy

  • 1.2.1 Add element-type-specific thresholds as class constants
  • 1.2.2 Modify _is_region_covered() to accept element type parameter
  • 1.2.3 Apply different thresholds based on element type (TEXT: 0.6, TABLE: 0.1, FIGURE: 0.8)

1.3 Boundary Shrinking

  • 1.3.1 Add optional shrink_pixels parameter to coverage detection
  • 1.3.2 Implement bbox shrinking logic (inward 1-2 px)

2. OCR Data Source Changes

2.1 Extract overall_ocr_res from PP-StructureV3

  • 2.1.1 Modify pp_structure_enhanced.py to extract overall_ocr_res from result
  • 2.1.2 Convert dt_polys + rec_texts + rec_scores to TextRegion format
  • 2.1.3 Store extracted OCR in result dict for gap filling

2.2 Update Processing Orchestrator

  • 2.2.1 Add option to use overall_ocr_res as OCR source
  • 2.2.2 Skip separate Raw OCR inference when using PP-StructureV3's OCR
  • 2.2.3 Maintain backward compatibility with explicit Raw OCR mode

3. Configuration Updates

3.1 Add Settings (config.py)

  • 3.1.1 Add gap_filling_ioa_threshold_text: float = 0.6
  • 3.1.2 Add gap_filling_ioa_threshold_table: float = 0.1
  • 3.1.3 Add gap_filling_ioa_threshold_figure: float = 0.8
  • 3.1.4 Add gap_filling_use_overall_ocr: bool = True
  • 3.1.5 Add gap_filling_shrink_pixels: int = 1

4. Testing

4.1 Unit Tests

  • 4.1.1 Test IoA calculation with known values
  • 4.1.2 Test dynamic threshold selection by element type
  • 4.1.3 Test boundary shrinking edge cases

4.2 Integration Tests

  • 4.2.1 Test with scan.pdf (current problematic file)
  • 4.2.2 Compare results: old IoU vs new IoA approach
  • 4.2.3 Verify no duplicate text rendering in output PDF
  • 4.2.4 Verify table content is not duplicated outside table bounds

5. Documentation

  • 5.1 Update spec documentation with new algorithm
  • 5.2 Add inline code comments explaining IoA vs IoU