Files
OCR/openspec/changes/fix-ocr-track-table-rendering/specs/pdf-generation/spec.md
egg c65df754cf wip: add TableData.from_dict() for OCR track table parsing (incomplete)
Add TableData.from_dict() and TableCell.from_dict() methods to convert
JSON table dicts to proper TableData objects during UnifiedDocument parsing.

Modified _json_to_document_element() to detect TABLE elements with dict
content containing 'cells' key and convert to TableData.

Note: This fix ensures table elements have proper to_html() method available
but the rendered output still needs investigation - tables may still render
incorrectly in OCR track PDFs.

Files changed:
- unified_document.py: Add from_dict() class methods
- pdf_generator_service.py: Convert table dicts during JSON parsing
- Add fix-ocr-track-table-rendering proposal

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 19:16:51 +08:00

53 lines
1.9 KiB
Markdown

# PDF Generation - OCR Track Table Rendering Fix
## MODIFIED Requirements
### Requirement: OCR Track Table Content Conversion
The PDF generator MUST properly convert table content from JSON dict format to renderable structure when processing OCR track results.
#### Scenario: Table dict with cells array converts to proper HTML
Given an OCR track JSON with table element containing rows, cols, and cells array
When the PDF generator processes this element
Then the table content MUST be converted to a TableData object
And TableData.to_html() MUST produce valid HTML with proper tr/td structure
And the generated PDF table MUST have cells positioned in correct grid locations
#### Scenario: Table with rowspan/colspan renders correctly
Given a table element with cells having rowspan > 1 or colspan > 1
When the PDF generator renders the table
Then merged cells MUST span the correct number of rows/columns
And content MUST appear in the merged cell position
### Requirement: Table Visual Fidelity
The PDF generator MUST render OCR track tables with visual structure matching the original document.
#### Scenario: Table renders with grid lines
Given an OCR track table element
When rendered to PDF
Then the table MUST have visible grid lines/borders
And cell boundaries MUST be clearly defined
#### Scenario: Table text alignment preserved
Given an OCR track table with cell content
When rendered to PDF
Then text MUST be positioned within the correct cell boundaries
And text MUST NOT overflow into adjacent cells
### Requirement: Backward Compatibility with Hybrid Mode
The table rendering fix MUST NOT break hybrid mode processing.
#### Scenario: Hybrid mode tables render correctly
Given a document processed with hybrid mode combining Direct and OCR tracks
When PDF is generated
Then Direct track tables MUST render with existing quality
And OCR track tables MUST render with improved quality
And no regression in table positioning or content