egg/OCR

Files

egg 940a406dce chore: backup before code cleanup

Backup commit before executing remove-unused-code proposal.
This includes all pending changes and new features.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-11 11:55:39 +08:00

1.4 KiB

Raw Blame History

Simple Text Positioning from Raw OCR

Summary

Simplify OCR track PDF generation by rendering raw OCR text at correct positions without complex table structure reconstruction.

Problem

Current OCR track processing has multiple failure points:

PP-Structure table structure recognition fails for borderless tables
Multi-column layouts get merged incorrectly into single tables
Table HTML reconstruction produces wrong cell positions
Complex column correction algorithms still can't fix fundamental structure errors

Meanwhile, raw OCR (raw_ocr_regions.json) correctly identifies all text with accurate bounding boxes.

Solution

Replace complex table reconstruction with simple text positioning:

Read raw OCR regions directly
Position text at bbox coordinates
Calculate text rotation from bbox quadrilateral shape
Estimate font size from bbox height
Skip table HTML parsing entirely for OCR track

Benefits

Reliability: Raw OCR text positions are accurate
Simplicity: Eliminates complex table parsing logic
Performance: Faster processing without structure analysis
Consistency: Predictable output regardless of table type

Trade-offs

No table borders in output
No cell structure (colspan, rowspan)
Visual layout approximation rather than semantic structure

Scope

OCR track PDF generation only
Direct track remains unchanged (uses native PDF text extraction)

1.4 KiB Raw Blame History