- Enable PP-StructureV3's use_doc_orientation_classify feature - Detect rotation angle from doc_preprocessor_res.angle - Swap page dimensions (width <-> height) for 90°/270° rotations - Output PDF now correctly displays landscape-scanned content Also includes: - Archive completed openspec proposals - Add simplify-frontend-ocr-config proposal (pending) - Code cleanup and frontend simplification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1.7 KiB
1.7 KiB
document-processing Specification Delta
MODIFIED Requirements
Requirement: Extract table structure (Modified)
The system SHALL use cell_boxes coordinates as the primary source for table structure when rendering PDFs, with HTML parsing as fallback.
Scenario: Render table using cell_boxes grid
- WHEN rendering a table element to PDF
- AND the table has valid cell_boxes coordinates
- AND
table_rendering_prefer_cellboxesis enabled - THEN the system SHALL infer row/column grid from cell_boxes coordinates
- AND extract text content from HTML in reading order
- AND map content to grid cells by position
- AND render table borders using cell_boxes coordinates
- AND place text content within calculated cell boundaries
Scenario: Handle cell_boxes grid mismatch gracefully
- WHEN cell_boxes grid has different dimensions than HTML colspan/rowspan structure
- THEN the system SHALL use cell_boxes grid as authoritative structure
- AND map available HTML content to cells row-by-row
- AND leave unmapped cells empty
- AND log warning if content count differs significantly
Scenario: Fallback to HTML-based rendering
- WHEN cell_boxes is empty or None
- OR
table_rendering_prefer_cellboxesis disabled - OR cell_boxes grid inference fails
- THEN the system SHALL fall back to existing HTML-based table rendering
- AND use ReportLab Table with parsed HTML structure
Scenario: Maintain backward compatibility
- WHEN processing tables where cell_boxes grid matches HTML structure
- THEN the system SHALL produce identical output to previous behavior
- AND pass all existing table rendering tests