# Design: Fix PDF Table Rendering ## Context OCR track produces tables with: - `cell_boxes`: Accurate pixel coordinates for each cell border - `cells`: Content with row/col indices and row_span/col_span - `embedded_images`: Images within table cells Current implementations fail to use these correctly: - **Reflow PDF**: Ignores merged cells, misaligns content - **Translated Layout PDF**: Creates new Table object instead of using cell_boxes ## Goals / Non-Goals **Goals:** - Translated Layout PDF tables match untranslated Layout PDF quality - Reflow PDF tables are readable and correctly structured - Embedded images appear in both formats **Non-Goals:** - Perfect pixel-level replication of original table styling - Support for complex nested tables ## Decisions ### Decision 1: Translated Layout PDF uses Layered Rendering **What**: Draw cell borders using `cell_boxes`, then render translated text in each cell separately **Why**: This matches the working approach in `_draw_table_with_cell_boxes()` for untranslated PDFs ```python # Step 1: Draw borders using cell_boxes for cell_box in cell_boxes: pdf_canvas.rect(x, y, width, height) # Step 2: Render text for each cell for cell in cells: cell_bbox = find_matching_cell_box(cell, cell_boxes) draw_text_in_bbox(translated_content, cell_bbox) ``` ### Decision 2: Reflow PDF uses ReportLab SPAN for merged cells **What**: Apply `('SPAN', (col1, row1), (col2, row2))` style for merged cells **Why**: ReportLab's Table natively supports merged cells via TableStyle ```python # Build span commands from cell data for cell in cells: if cell.row_span > 1 or cell.col_span > 1: spans.append(('SPAN', (cell.col, cell.row), (cell.col + cell.col_span - 1, cell.row + cell.row_span - 1))) ``` ### Decision 3: Column widths from cell_boxes ratio **What**: Calculate column widths proportionally from cell_boxes **Why**: Preserves original table structure in reflow mode ## Risks / Trade-offs | Risk | Mitigation | |------|------------| | Text overflow in translated cells | Shrink font (min 8pt) or truncate with ellipsis | | cell_boxes not matching cells count | Fall back to equal-width columns | | Complex merged cell patterns | Handle simple spans, skip complex patterns | ## Open Questions - Should reflow PDF preserve exact column width ratios or allow ReportLab auto-sizing? - How to handle cells with both text and images?