Files
OCR/openspec/changes/archive/2025-12-03-fix-pdf-table-rendering/proposal.md
egg 1b5c7f39a8 fix: improve PDF layout generation for Direct track
Key fixes:
- Skip large vector_graphics charts (>50% page coverage) that cover text
- Fix font fallback to use NotoSansSC for CJK support instead of Helvetica
- Improve translated table rendering with dynamic font sizing
- Add merged cell (row_span/col_span) support for reflow tables
- Skip text elements inside table bboxes to avoid duplication

Archive openspec proposal: fix-pdf-table-rendering

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 14:55:00 +08:00

889 B

Change: Fix PDF Table Rendering Issues

Why

OCR track PDF exports have significant table rendering problems:

  1. Reflow PDF (both translated and untranslated): Tables are misaligned due to missing row_span/col_span support
  2. Translated Layout PDF: Table borders disappear and text overlaps because it doesn't use the accurate cell_boxes positioning

What Changes

  • Translated Layout PDF: Adopt layered rendering approach (borders + text separately) using cell_boxes from metadata
  • Reflow PDF Tables: Fix cell extraction and add basic merged cell support
  • Ensure embedded images in tables are rendered correctly in all PDF formats

Impact

  • Affected specs: result-export
  • Affected code:
    • backend/app/services/pdf_generator_service.py
      • _draw_translated_table() - needs complete rewrite
      • _create_reflow_table() - needs merged cell support