# Implementation Tasks ## Phase 1: Core Fix - Table Content Conversion ### 1.1 Add TableData.from_dict() class method - [ ] In `unified_document.py`, add `from_dict()` method to `TableData` class - [ ] Handle conversion of cells list (list of dicts) to `TableCell` objects - [ ] Preserve rows, cols, headers, caption fields ### 1.2 Fix _json_to_document_element for TABLE elements - [ ] In `pdf_generator_service.py`, modify `_json_to_document_element` - [ ] When `elem_type == ElementType.TABLE` and content is dict with 'cells', convert to `TableData` - [ ] Use `TableData.from_dict()` for clean conversion ### 1.3 Verify TableData.to_html() generates correct HTML - [ ] Test that `to_html()` produces parseable HTML with proper row/cell structure - [ ] Verify colspan/rowspan attributes are correctly generated - [ ] Ensure empty cells are properly handled ## Phase 2: OCR Track Rendering Consistency ### 2.1 Review convert_unified_document_to_ocr_data - [ ] Verify TableData objects are properly converted to HTML - [ ] Add fallback handling for dict content with 'cells' key - [ ] Log warning if content cannot be converted to HTML ### 2.2 Review draw_table_region - [ ] Verify HTMLTableParser correctly parses generated HTML - [ ] Check that ReportLab Table is positioned at correct bbox - [ ] Verify font and style application ## Phase 3: Testing and Verification ### 3.1 Test OCR Track - [ ] Test scan.pdf - verify tables have correct structure - [ ] Test img1.png, img2.png, img3.png - [ ] Compare generated PDF with original documents ### 3.2 Test Direct Track (Regression) - [ ] Test PDF files with Direct track - [ ] Verify table rendering unchanged ### 3.3 Test Hybrid Mode - [ ] Test files that trigger hybrid processing - [ ] Verify mixed Direct + OCR elements render correctly ## Phase 4: Code Quality ### 4.1 Add logging - [ ] Add debug logging for table content type detection - [ ] Log conversion steps for troubleshooting ### 4.2 Error handling - [ ] Handle malformed cell data gracefully - [ ] Log warnings for unexpected content formats