feat: enable document orientation detection for scanned PDFs

- Enable PP-StructureV3's use_doc_orientation_classify feature - Detect rotation angle from doc_preprocessor_res.angle - Swap page dimensions (width <-> height) for 90°/270° rotations - Output PDF now correctly displays landscape-scanned content Also includes: - Archive completed openspec proposals - Add simplify-frontend-ocr-config proposal (pending) - Code cleanup and frontend simplification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:13:46 +08:00
parent 57070af307
commit cfe65158a3
58 changed files with 1271 additions and 3048 deletions
--- a/openspec/changes/archive/2025-12-11-improve-ocr-track-algorithm/tasks.md
+++ b/openspec/changes/archive/2025-12-11-improve-ocr-track-algorithm/tasks.md
@@ -0,0 +1,54 @@
+## 1. Algorithm Changes (gap_filling_service.py)
+
+### 1.1 IoA Implementation
+- [x] 1.1.1 Add `_calculate_ioa()` method alongside existing `_calculate_iou()`
+- [x] 1.1.2 Modify `_is_region_covered()` to use IoA instead of IoU
+- [x] 1.1.3 Update deduplication logic to use IoA
+
+### 1.2 Dynamic Threshold Strategy
+- [x] 1.2.1 Add element-type-specific thresholds as class constants
+- [x] 1.2.2 Modify `_is_region_covered()` to accept element type parameter
+- [x] 1.2.3 Apply different thresholds based on element type (TEXT: 0.6, TABLE: 0.1, FIGURE: 0.8)
+
+### 1.3 Boundary Shrinking
+- [x] 1.3.1 Add optional `shrink_pixels` parameter to coverage detection
+- [x] 1.3.2 Implement bbox shrinking logic (inward 1-2 px)
+
+## 2. OCR Data Source Changes
+
+### 2.1 Extract overall_ocr_res from PP-StructureV3
+- [x] 2.1.1 Modify `pp_structure_enhanced.py` to extract `overall_ocr_res` from result
+- [x] 2.1.2 Convert `dt_polys` + `rec_texts` + `rec_scores` to TextRegion format
+- [x] 2.1.3 Store extracted OCR in result dict for gap filling
+
+### 2.2 Update Processing Orchestrator
+- [x] 2.2.1 Add option to use `overall_ocr_res` as OCR source
+- [x] 2.2.2 Skip separate Raw OCR inference when using PP-StructureV3's OCR
+- [x] 2.2.3 Maintain backward compatibility with explicit Raw OCR mode
+
+## 3. Configuration Updates
+
+### 3.1 Add Settings (config.py)
+- [x] 3.1.1 Add `gap_filling_ioa_threshold_text: float = 0.6`
+- [x] 3.1.2 Add `gap_filling_ioa_threshold_table: float = 0.1`
+- [x] 3.1.3 Add `gap_filling_ioa_threshold_figure: float = 0.8`
+- [x] 3.1.4 Add `gap_filling_use_overall_ocr: bool = True`
+- [x] 3.1.5 Add `gap_filling_shrink_pixels: int = 1`
+
+## 4. Testing
+
+### 4.1 Unit Tests
+- [ ] 4.1.1 Test IoA calculation with known values
+- [ ] 4.1.2 Test dynamic threshold selection by element type
+- [ ] 4.1.3 Test boundary shrinking edge cases
+
+### 4.2 Integration Tests
+- [ ] 4.2.1 Test with scan.pdf (current problematic file)
+- [ ] 4.2.2 Compare results: old IoU vs new IoA approach
+- [ ] 4.2.3 Verify no duplicate text rendering in output PDF
+- [ ] 4.2.4 Verify table content is not duplicated outside table bounds
+
+## 5. Documentation
+
+- [x] 5.1 Update spec documentation with new algorithm
+- [x] 5.2 Add inline code comments explaining IoA vs IoU