Add TableData.from_dict() and TableCell.from_dict() methods to convert JSON table dicts to proper TableData objects during UnifiedDocument parsing. Modified _json_to_document_element() to detect TABLE elements with dict content containing 'cells' key and convert to TableData. Note: This fix ensures table elements have proper to_html() method available but the rendered output still needs investigation - tables may still render incorrectly in OCR track PDFs. Files changed: - unified_document.py: Add from_dict() class methods - pdf_generator_service.py: Convert table dicts during JSON parsing - Add fix-ocr-track-table-rendering proposal 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
53 lines
1.9 KiB
Markdown
53 lines
1.9 KiB
Markdown
# PDF Generation - OCR Track Table Rendering Fix
|
|
|
|
## MODIFIED Requirements
|
|
|
|
### Requirement: OCR Track Table Content Conversion
|
|
|
|
The PDF generator MUST properly convert table content from JSON dict format to renderable structure when processing OCR track results.
|
|
|
|
#### Scenario: Table dict with cells array converts to proper HTML
|
|
|
|
Given an OCR track JSON with table element containing rows, cols, and cells array
|
|
When the PDF generator processes this element
|
|
Then the table content MUST be converted to a TableData object
|
|
And TableData.to_html() MUST produce valid HTML with proper tr/td structure
|
|
And the generated PDF table MUST have cells positioned in correct grid locations
|
|
|
|
#### Scenario: Table with rowspan/colspan renders correctly
|
|
|
|
Given a table element with cells having rowspan > 1 or colspan > 1
|
|
When the PDF generator renders the table
|
|
Then merged cells MUST span the correct number of rows/columns
|
|
And content MUST appear in the merged cell position
|
|
|
|
### Requirement: Table Visual Fidelity
|
|
|
|
The PDF generator MUST render OCR track tables with visual structure matching the original document.
|
|
|
|
#### Scenario: Table renders with grid lines
|
|
|
|
Given an OCR track table element
|
|
When rendered to PDF
|
|
Then the table MUST have visible grid lines/borders
|
|
And cell boundaries MUST be clearly defined
|
|
|
|
#### Scenario: Table text alignment preserved
|
|
|
|
Given an OCR track table with cell content
|
|
When rendered to PDF
|
|
Then text MUST be positioned within the correct cell boundaries
|
|
And text MUST NOT overflow into adjacent cells
|
|
|
|
### Requirement: Backward Compatibility with Hybrid Mode
|
|
|
|
The table rendering fix MUST NOT break hybrid mode processing.
|
|
|
|
#### Scenario: Hybrid mode tables render correctly
|
|
|
|
Given a document processed with hybrid mode combining Direct and OCR tracks
|
|
When PDF is generated
|
|
Then Direct track tables MUST render with existing quality
|
|
And OCR track tables MUST render with improved quality
|
|
And no regression in table positioning or content
|