feat: enable document orientation detection for scanned PDFs
- Enable PP-StructureV3's use_doc_orientation_classify feature - Detect rotation angle from doc_preprocessor_res.angle - Swap page dimensions (width <-> height) for 90°/270° rotations - Output PDF now correctly displays landscape-scanned content Also includes: - Archive completed openspec proposals - Add simplify-frontend-ocr-config proposal (pending) - Code cleanup and frontend simplification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,42 @@
|
||||
# Simple Text Positioning from Raw OCR
|
||||
|
||||
## Summary
|
||||
|
||||
Simplify OCR track PDF generation by rendering raw OCR text at correct positions without complex table structure reconstruction.
|
||||
|
||||
## Problem
|
||||
|
||||
Current OCR track processing has multiple failure points:
|
||||
1. PP-Structure table structure recognition fails for borderless tables
|
||||
2. Multi-column layouts get merged incorrectly into single tables
|
||||
3. Table HTML reconstruction produces wrong cell positions
|
||||
4. Complex column correction algorithms still can't fix fundamental structure errors
|
||||
|
||||
Meanwhile, raw OCR (`raw_ocr_regions.json`) correctly identifies all text with accurate bounding boxes.
|
||||
|
||||
## Solution
|
||||
|
||||
Replace complex table reconstruction with simple text positioning:
|
||||
1. Read raw OCR regions directly
|
||||
2. Position text at bbox coordinates
|
||||
3. Calculate text rotation from bbox quadrilateral shape
|
||||
4. Estimate font size from bbox height
|
||||
5. Skip table HTML parsing entirely for OCR track
|
||||
|
||||
## Benefits
|
||||
|
||||
- **Reliability**: Raw OCR text positions are accurate
|
||||
- **Simplicity**: Eliminates complex table parsing logic
|
||||
- **Performance**: Faster processing without structure analysis
|
||||
- **Consistency**: Predictable output regardless of table type
|
||||
|
||||
## Trade-offs
|
||||
|
||||
- No table borders in output
|
||||
- No cell structure (colspan, rowspan)
|
||||
- Visual layout approximation rather than semantic structure
|
||||
|
||||
## Scope
|
||||
|
||||
- OCR track PDF generation only
|
||||
- Direct track remains unchanged (uses native PDF text extraction)
|
||||
Reference in New Issue
Block a user