Backup commit before executing remove-unused-code proposal. This includes all pending changes and new features. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
43 lines
1.4 KiB
Markdown
43 lines
1.4 KiB
Markdown
# Simple Text Positioning from Raw OCR
|
|
|
|
## Summary
|
|
|
|
Simplify OCR track PDF generation by rendering raw OCR text at correct positions without complex table structure reconstruction.
|
|
|
|
## Problem
|
|
|
|
Current OCR track processing has multiple failure points:
|
|
1. PP-Structure table structure recognition fails for borderless tables
|
|
2. Multi-column layouts get merged incorrectly into single tables
|
|
3. Table HTML reconstruction produces wrong cell positions
|
|
4. Complex column correction algorithms still can't fix fundamental structure errors
|
|
|
|
Meanwhile, raw OCR (`raw_ocr_regions.json`) correctly identifies all text with accurate bounding boxes.
|
|
|
|
## Solution
|
|
|
|
Replace complex table reconstruction with simple text positioning:
|
|
1. Read raw OCR regions directly
|
|
2. Position text at bbox coordinates
|
|
3. Calculate text rotation from bbox quadrilateral shape
|
|
4. Estimate font size from bbox height
|
|
5. Skip table HTML parsing entirely for OCR track
|
|
|
|
## Benefits
|
|
|
|
- **Reliability**: Raw OCR text positions are accurate
|
|
- **Simplicity**: Eliminates complex table parsing logic
|
|
- **Performance**: Faster processing without structure analysis
|
|
- **Consistency**: Predictable output regardless of table type
|
|
|
|
## Trade-offs
|
|
|
|
- No table borders in output
|
|
- No cell structure (colspan, rowspan)
|
|
- Visual layout approximation rather than semantic structure
|
|
|
|
## Scope
|
|
|
|
- OCR track PDF generation only
|
|
- Direct track remains unchanged (uses native PDF text extraction)
|