feat: enable document orientation detection for scanned PDFs

- Enable PP-StructureV3's use_doc_orientation_classify feature
- Detect rotation angle from doc_preprocessor_res.angle
- Swap page dimensions (width <-> height) for 90°/270° rotations
- Output PDF now correctly displays landscape-scanned content

Also includes:
- Archive completed openspec proposals
- Add simplify-frontend-ocr-config proposal (pending)
- Code cleanup and frontend simplification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
egg
2025-12-11 17:13:46 +08:00
parent 57070af307
commit cfe65158a3
58 changed files with 1271 additions and 3048 deletions

View File

@@ -0,0 +1,54 @@
## 1. Algorithm Changes (gap_filling_service.py)
### 1.1 IoA Implementation
- [x] 1.1.1 Add `_calculate_ioa()` method alongside existing `_calculate_iou()`
- [x] 1.1.2 Modify `_is_region_covered()` to use IoA instead of IoU
- [x] 1.1.3 Update deduplication logic to use IoA
### 1.2 Dynamic Threshold Strategy
- [x] 1.2.1 Add element-type-specific thresholds as class constants
- [x] 1.2.2 Modify `_is_region_covered()` to accept element type parameter
- [x] 1.2.3 Apply different thresholds based on element type (TEXT: 0.6, TABLE: 0.1, FIGURE: 0.8)
### 1.3 Boundary Shrinking
- [x] 1.3.1 Add optional `shrink_pixels` parameter to coverage detection
- [x] 1.3.2 Implement bbox shrinking logic (inward 1-2 px)
## 2. OCR Data Source Changes
### 2.1 Extract overall_ocr_res from PP-StructureV3
- [x] 2.1.1 Modify `pp_structure_enhanced.py` to extract `overall_ocr_res` from result
- [x] 2.1.2 Convert `dt_polys` + `rec_texts` + `rec_scores` to TextRegion format
- [x] 2.1.3 Store extracted OCR in result dict for gap filling
### 2.2 Update Processing Orchestrator
- [x] 2.2.1 Add option to use `overall_ocr_res` as OCR source
- [x] 2.2.2 Skip separate Raw OCR inference when using PP-StructureV3's OCR
- [x] 2.2.3 Maintain backward compatibility with explicit Raw OCR mode
## 3. Configuration Updates
### 3.1 Add Settings (config.py)
- [x] 3.1.1 Add `gap_filling_ioa_threshold_text: float = 0.6`
- [x] 3.1.2 Add `gap_filling_ioa_threshold_table: float = 0.1`
- [x] 3.1.3 Add `gap_filling_ioa_threshold_figure: float = 0.8`
- [x] 3.1.4 Add `gap_filling_use_overall_ocr: bool = True`
- [x] 3.1.5 Add `gap_filling_shrink_pixels: int = 1`
## 4. Testing
### 4.1 Unit Tests
- [ ] 4.1.1 Test IoA calculation with known values
- [ ] 4.1.2 Test dynamic threshold selection by element type
- [ ] 4.1.3 Test boundary shrinking edge cases
### 4.2 Integration Tests
- [ ] 4.2.1 Test with scan.pdf (current problematic file)
- [ ] 4.2.2 Compare results: old IoU vs new IoA approach
- [ ] 4.2.3 Verify no duplicate text rendering in output PDF
- [ ] 4.2.4 Verify table content is not duplicated outside table bounds
## 5. Documentation
- [x] 5.1 Update spec documentation with new algorithm
- [x] 5.2 Add inline code comments explaining IoA vs IoU