Files
OCR/openspec/changes/simple-text-positioning/proposal.md
egg 940a406dce chore: backup before code cleanup
Backup commit before executing remove-unused-code proposal.
This includes all pending changes and new features.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 11:55:39 +08:00

1.4 KiB

Simple Text Positioning from Raw OCR

Summary

Simplify OCR track PDF generation by rendering raw OCR text at correct positions without complex table structure reconstruction.

Problem

Current OCR track processing has multiple failure points:

  1. PP-Structure table structure recognition fails for borderless tables
  2. Multi-column layouts get merged incorrectly into single tables
  3. Table HTML reconstruction produces wrong cell positions
  4. Complex column correction algorithms still can't fix fundamental structure errors

Meanwhile, raw OCR (raw_ocr_regions.json) correctly identifies all text with accurate bounding boxes.

Solution

Replace complex table reconstruction with simple text positioning:

  1. Read raw OCR regions directly
  2. Position text at bbox coordinates
  3. Calculate text rotation from bbox quadrilateral shape
  4. Estimate font size from bbox height
  5. Skip table HTML parsing entirely for OCR track

Benefits

  • Reliability: Raw OCR text positions are accurate
  • Simplicity: Eliminates complex table parsing logic
  • Performance: Faster processing without structure analysis
  • Consistency: Predictable output regardless of table type

Trade-offs

  • No table borders in output
  • No cell structure (colspan, rowspan)
  • Visual layout approximation rather than semantic structure

Scope

  • OCR track PDF generation only
  • Direct track remains unchanged (uses native PDF text extraction)