chore: backup before code cleanup

Backup commit before executing remove-unused-code proposal.
This includes all pending changes and new features.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
egg
2025-12-11 11:55:39 +08:00
parent eff9b0bcd5
commit 940a406dce
58 changed files with 8226 additions and 175 deletions

View File

@@ -0,0 +1,36 @@
# document-processing Specification Delta
## MODIFIED Requirements
### Requirement: Extract table structure (Modified)
The system SHALL use cell_boxes coordinates as the primary source for table structure when rendering PDFs, with HTML parsing as fallback.
#### Scenario: Render table using cell_boxes grid
- **WHEN** rendering a table element to PDF
- **AND** the table has valid cell_boxes coordinates
- **AND** `table_rendering_prefer_cellboxes` is enabled
- **THEN** the system SHALL infer row/column grid from cell_boxes coordinates
- **AND** extract text content from HTML in reading order
- **AND** map content to grid cells by position
- **AND** render table borders using cell_boxes coordinates
- **AND** place text content within calculated cell boundaries
#### Scenario: Handle cell_boxes grid mismatch gracefully
- **WHEN** cell_boxes grid has different dimensions than HTML colspan/rowspan structure
- **THEN** the system SHALL use cell_boxes grid as authoritative structure
- **AND** map available HTML content to cells row-by-row
- **AND** leave unmapped cells empty
- **AND** log warning if content count differs significantly
#### Scenario: Fallback to HTML-based rendering
- **WHEN** cell_boxes is empty or None
- **OR** `table_rendering_prefer_cellboxes` is disabled
- **OR** cell_boxes grid inference fails
- **THEN** the system SHALL fall back to existing HTML-based table rendering
- **AND** use ReportLab Table with parsed HTML structure
#### Scenario: Maintain backward compatibility
- **WHEN** processing tables where cell_boxes grid matches HTML structure
- **THEN** the system SHALL produce identical output to previous behavior
- **AND** pass all existing table rendering tests