feat: enable document orientation detection for scanned PDFs

- Enable PP-StructureV3's use_doc_orientation_classify feature - Detect rotation angle from doc_preprocessor_res.angle - Swap page dimensions (width <-> height) for 90°/270° rotations - Output PDF now correctly displays landscape-scanned content Also includes: - Archive completed openspec proposals - Add simplify-frontend-ocr-config proposal (pending) - Code cleanup and frontend simplification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:13:46 +08:00
parent 57070af307
commit cfe65158a3
58 changed files with 1271 additions and 3048 deletions
--- a/openspec/changes/archive/2025-12-11-simple-text-positioning/proposal.md
+++ b/openspec/changes/archive/2025-12-11-simple-text-positioning/proposal.md
@@ -0,0 +1,42 @@
+# Simple Text Positioning from Raw OCR
+
+## Summary
+
+Simplify OCR track PDF generation by rendering raw OCR text at correct positions without complex table structure reconstruction.
+
+## Problem
+
+Current OCR track processing has multiple failure points:
+1. PP-Structure table structure recognition fails for borderless tables
+2. Multi-column layouts get merged incorrectly into single tables
+3. Table HTML reconstruction produces wrong cell positions
+4. Complex column correction algorithms still can't fix fundamental structure errors
+
+Meanwhile, raw OCR (`raw_ocr_regions.json`) correctly identifies all text with accurate bounding boxes.
+
+## Solution
+
+Replace complex table reconstruction with simple text positioning:
+1. Read raw OCR regions directly
+2. Position text at bbox coordinates
+3. Calculate text rotation from bbox quadrilateral shape
+4. Estimate font size from bbox height
+5. Skip table HTML parsing entirely for OCR track
+
+## Benefits
+
+- **Reliability**: Raw OCR text positions are accurate
+- **Simplicity**: Eliminates complex table parsing logic
+- **Performance**: Faster processing without structure analysis
+- **Consistency**: Predictable output regardless of table type
+
+## Trade-offs
+
+- No table borders in output
+- No cell structure (colspan, rowspan)
+- Visual layout approximation rather than semantic structure
+
+## Scope
+
+- OCR track PDF generation only
+- Direct track remains unchanged (uses native PDF text extraction)