# document-processing Specification Delta ## MODIFIED Requirements ### Requirement: Extract table structure (Modified) The system SHALL use cell_boxes coordinates as the primary source for table structure when rendering PDFs, with HTML parsing as fallback. #### Scenario: Render table using cell_boxes grid - **WHEN** rendering a table element to PDF - **AND** the table has valid cell_boxes coordinates - **AND** `table_rendering_prefer_cellboxes` is enabled - **THEN** the system SHALL infer row/column grid from cell_boxes coordinates - **AND** extract text content from HTML in reading order - **AND** map content to grid cells by position - **AND** render table borders using cell_boxes coordinates - **AND** place text content within calculated cell boundaries #### Scenario: Handle cell_boxes grid mismatch gracefully - **WHEN** cell_boxes grid has different dimensions than HTML colspan/rowspan structure - **THEN** the system SHALL use cell_boxes grid as authoritative structure - **AND** map available HTML content to cells row-by-row - **AND** leave unmapped cells empty - **AND** log warning if content count differs significantly #### Scenario: Fallback to HTML-based rendering - **WHEN** cell_boxes is empty or None - **OR** `table_rendering_prefer_cellboxes` is disabled - **OR** cell_boxes grid inference fails - **THEN** the system SHALL fall back to existing HTML-based table rendering - **AND** use ReportLab Table with parsed HTML structure #### Scenario: Maintain backward compatibility - **WHEN** processing tables where cell_boxes grid matches HTML structure - **THEN** the system SHALL produce identical output to previous behavior - **AND** pass all existing table rendering tests