Backup commit before executing remove-unused-code proposal. This includes all pending changes and new features. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1.7 KiB
1.7 KiB
document-processing Specification Delta
MODIFIED Requirements
Requirement: Extract table structure (Modified)
The system SHALL use cell_boxes coordinates as the primary source for table structure when rendering PDFs, with HTML parsing as fallback.
Scenario: Render table using cell_boxes grid
- WHEN rendering a table element to PDF
- AND the table has valid cell_boxes coordinates
- AND
table_rendering_prefer_cellboxesis enabled - THEN the system SHALL infer row/column grid from cell_boxes coordinates
- AND extract text content from HTML in reading order
- AND map content to grid cells by position
- AND render table borders using cell_boxes coordinates
- AND place text content within calculated cell boundaries
Scenario: Handle cell_boxes grid mismatch gracefully
- WHEN cell_boxes grid has different dimensions than HTML colspan/rowspan structure
- THEN the system SHALL use cell_boxes grid as authoritative structure
- AND map available HTML content to cells row-by-row
- AND leave unmapped cells empty
- AND log warning if content count differs significantly
Scenario: Fallback to HTML-based rendering
- WHEN cell_boxes is empty or None
- OR
table_rendering_prefer_cellboxesis disabled - OR cell_boxes grid inference fails
- THEN the system SHALL fall back to existing HTML-based table rendering
- AND use ReportLab Table with parsed HTML structure
Scenario: Maintain backward compatibility
- WHEN processing tables where cell_boxes grid matches HTML structure
- THEN the system SHALL produce identical output to previous behavior
- AND pass all existing table rendering tests