Files
OCR/openspec/changes/archive/2025-12-11-simple-text-positioning/proposal.md
egg cfe65158a3 feat: enable document orientation detection for scanned PDFs
- Enable PP-StructureV3's use_doc_orientation_classify feature
- Detect rotation angle from doc_preprocessor_res.angle
- Swap page dimensions (width <-> height) for 90°/270° rotations
- Output PDF now correctly displays landscape-scanned content

Also includes:
- Archive completed openspec proposals
- Add simplify-frontend-ocr-config proposal (pending)
- Code cleanup and frontend simplification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:13:46 +08:00

1.4 KiB

Simple Text Positioning from Raw OCR

Summary

Simplify OCR track PDF generation by rendering raw OCR text at correct positions without complex table structure reconstruction.

Problem

Current OCR track processing has multiple failure points:

  1. PP-Structure table structure recognition fails for borderless tables
  2. Multi-column layouts get merged incorrectly into single tables
  3. Table HTML reconstruction produces wrong cell positions
  4. Complex column correction algorithms still can't fix fundamental structure errors

Meanwhile, raw OCR (raw_ocr_regions.json) correctly identifies all text with accurate bounding boxes.

Solution

Replace complex table reconstruction with simple text positioning:

  1. Read raw OCR regions directly
  2. Position text at bbox coordinates
  3. Calculate text rotation from bbox quadrilateral shape
  4. Estimate font size from bbox height
  5. Skip table HTML parsing entirely for OCR track

Benefits

  • Reliability: Raw OCR text positions are accurate
  • Simplicity: Eliminates complex table parsing logic
  • Performance: Faster processing without structure analysis
  • Consistency: Predictable output regardless of table type

Trade-offs

  • No table borders in output
  • No cell structure (colspan, rowspan)
  • Visual layout approximation rather than semantic structure

Scope

  • OCR track PDF generation only
  • Direct track remains unchanged (uses native PDF text extraction)