Files
OCR/openspec/changes/archive/2025-12-03-fix-pdf-table-rendering/specs/result-export/spec.md
egg 1b5c7f39a8 fix: improve PDF layout generation for Direct track
Key fixes:
- Skip large vector_graphics charts (>50% page coverage) that cover text
- Fix font fallback to use NotoSansSC for CJK support instead of Helvetica
- Improve translated table rendering with dynamic font sizing
- Add merged cell (row_span/col_span) support for reflow tables
- Skip text elements inside table bboxes to avoid duplication

Archive openspec proposal: fix-pdf-table-rendering

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 14:55:00 +08:00

1.8 KiB

MODIFIED Requirements

Requirement: Translated Layout PDF Generation

The system SHALL generate layout-preserving PDFs with translated content that maintain accurate table structure.

Scenario: Table with accurate borders

  • GIVEN an OCR result with tables containing cell_boxes metadata
  • WHEN generating translated layout PDF
  • THEN table cell borders SHALL be drawn at positions matching cell_boxes
  • AND translated text SHALL be rendered within each cell's bounding box

Scenario: Text overflow handling

  • GIVEN translated text longer than original text
  • WHEN text exceeds cell bounding box
  • THEN the system SHALL reduce font size (minimum 8pt) to fit content
  • OR truncate with ellipsis if minimum font size is insufficient

Scenario: Embedded images in tables

  • GIVEN a table with embedded_images in metadata
  • WHEN generating translated layout PDF
  • THEN images SHALL be rendered at their original positions within the table

Requirement: Reflow PDF Table Rendering

The system SHALL generate reflow PDFs with properly structured tables including merged cell support.

Scenario: Basic table rendering

  • GIVEN an OCR result with table cells containing row, col, content
  • WHEN generating reflow PDF
  • THEN cells SHALL be grouped by row and column indices
  • AND table SHALL render with visible borders

Scenario: Merged cells support

  • GIVEN table cells with row_span or col_span greater than 1
  • WHEN generating reflow PDF
  • THEN the system SHALL apply appropriate cell spanning
  • AND merged cells SHALL display content without duplication

Scenario: Column width calculation

  • GIVEN a table with cell_boxes metadata
  • WHEN generating reflow PDF
  • THEN column widths SHOULD be proportional to original cell widths