Backup commit before executing remove-unused-code proposal. This includes all pending changes and new features. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1.7 KiB
1.7 KiB
1. Core Algorithm Implementation
1.1 Grid Inference Module
- 1.1.1 Create
CellBoxGridInferrerclass inpdf_table_renderer.py - 1.1.2 Implement
cluster_values()for Y/X coordinate clustering - 1.1.3 Implement
infer_grid_from_cellboxes()main method - 1.1.4 Add row_heights and col_widths calculation
1.2 Content Mapping
- 1.2.1 Implement
extract_cell_contents()from HTML - 1.2.2 Implement
map_content_to_grid()for row-by-row assignment - 1.2.3 Handle content count mismatch (more/fewer cells)
2. PDF Generator Integration
2.1 New Rendering Path
- 2.1.1 Add
render_from_cellboxes_grid()method to TableRenderer - 2.1.2 Integrate into
draw_table_region()with cellboxes-first check - 2.1.3 Maintain fallback to existing HTML-based rendering
2.2 Cell Rendering
- 2.2.1 Draw cell borders using cell_boxes coordinates
- 2.2.2 Render text content with proper alignment and padding
- 2.2.3 Handle multi-line text within cells
3. Configuration
3.1 Settings
- 3.1.1 Add
table_rendering_prefer_cellboxes: bool = True - 3.1.2 Add
table_cellboxes_row_threshold: float = 15.0 - 3.1.3 Add
table_cellboxes_col_threshold: float = 15.0
4. Testing
4.1 Unit Tests
- 4.1.1 Test grid inference with various cell_box configurations
- 4.1.2 Test content mapping edge cases
- 4.1.3 Test coordinate clustering accuracy
4.2 Integration Tests
- 4.2.1 Test with
scan.pdfTable 7 (the problematic case) - 4.2.2 Verify no regression on existing table tests
- 4.2.3 Visual comparison of output PDFs
5. Documentation
- 5.1 Update inline code comments
- 5.2 Update spec with new table rendering requirement