wip: add TableData.from_dict() for OCR track table parsing (incomplete)
Add TableData.from_dict() and TableCell.from_dict() methods to convert JSON table dicts to proper TableData objects during UnifiedDocument parsing. Modified _json_to_document_element() to detect TABLE elements with dict content containing 'cells' key and convert to TableData. Note: This fix ensures table elements have proper to_html() method available but the rendered output still needs investigation - tables may still render incorrectly in OCR track PDFs. Files changed: - unified_document.py: Add from_dict() class methods - pdf_generator_service.py: Convert table dicts during JSON parsing - Add fix-ocr-track-table-rendering proposal 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -1945,11 +1945,23 @@ class PDFGeneratorService:
|
||||
if child:
|
||||
children.append(child)
|
||||
|
||||
# Process content based on element type
|
||||
content = elem_dict.get('content', '')
|
||||
|
||||
# For TABLE elements, convert dict content to TableData object
|
||||
if elem_type == ElementType.TABLE and isinstance(content, dict) and 'cells' in content:
|
||||
try:
|
||||
content = TableData.from_dict(content)
|
||||
logger.debug(f"Converted table dict to TableData: {content.rows}x{content.cols}, {len(content.cells)} cells")
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to convert table dict to TableData: {e}")
|
||||
# Keep original dict as fallback
|
||||
|
||||
# Create element
|
||||
element = DocumentElement(
|
||||
element_id=elem_dict.get('element_id', ''),
|
||||
type=elem_type,
|
||||
content=elem_dict.get('content', ''),
|
||||
content=content,
|
||||
bbox=bbox,
|
||||
confidence=elem_dict.get('confidence'),
|
||||
style=style,
|
||||
|
||||
Reference in New Issue
Block a user