## 1. Core Algorithm Implementation ### 1.1 Grid Inference Module - [x] 1.1.1 Create `CellBoxGridInferrer` class in `pdf_table_renderer.py` - [x] 1.1.2 Implement `cluster_values()` for Y/X coordinate clustering - [x] 1.1.3 Implement `infer_grid_from_cellboxes()` main method - [x] 1.1.4 Add row_heights and col_widths calculation ### 1.2 Content Mapping - [x] 1.2.1 Implement `extract_cell_contents()` from HTML - [x] 1.2.2 Implement `map_content_to_grid()` for row-by-row assignment - [x] 1.2.3 Handle content count mismatch (more/fewer cells) ## 2. PDF Generator Integration ### 2.1 New Rendering Path - [x] 2.1.1 Add `render_from_cellboxes_grid()` method to TableRenderer - [x] 2.1.2 Integrate into `draw_table_region()` with cellboxes-first check - [x] 2.1.3 Maintain fallback to existing HTML-based rendering ### 2.2 Cell Rendering - [x] 2.2.1 Draw cell borders using cell_boxes coordinates - [x] 2.2.2 Render text content with proper alignment and padding - [x] 2.2.3 Handle multi-line text within cells ## 3. Configuration ### 3.1 Settings - [x] 3.1.1 Add `table_rendering_prefer_cellboxes: bool = True` - [x] 3.1.2 Add `table_cellboxes_row_threshold: float = 15.0` - [x] 3.1.3 Add `table_cellboxes_col_threshold: float = 15.0` ## 4. Testing ### 4.1 Unit Tests - [x] 4.1.1 Test grid inference with various cell_box configurations - [x] 4.1.2 Test content mapping edge cases - [x] 4.1.3 Test coordinate clustering accuracy ### 4.2 Integration Tests - [ ] 4.2.1 Test with `scan.pdf` Table 7 (the problematic case) - [ ] 4.2.2 Verify no regression on existing table tests - [ ] 4.2.3 Visual comparison of output PDFs ## 5. Documentation - [x] 5.1 Update inline code comments - [x] 5.2 Update spec with new table rendering requirement