- Enable PP-StructureV3's use_doc_orientation_classify feature - Detect rotation angle from doc_preprocessor_res.angle - Swap page dimensions (width <-> height) for 90°/270° rotations - Output PDF now correctly displays landscape-scanned content Also includes: - Archive completed openspec proposals - Add simplify-frontend-ocr-config proposal (pending) - Code cleanup and frontend simplification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2.8 KiB
2.8 KiB
ADDED Requirements
Requirement: Table Column Alignment Correction
The system SHALL correct table cell column assignments using header-anchor alignment when PP-Structure outputs incorrect column indices.
Scenario: Correct column shift using header anchors
- WHEN processing a table with cell_boxes and HTML content
- THEN the system SHALL extract header row (row_idx=0) column X-coordinate ranges
- AND validate each cell's column assignment against header X-ranges
- AND correct column index if cell X-overlap with assigned column is < 50%
- AND assign cell to column with highest X-overlap
Scenario: Handle tables without headers
- WHEN processing a table without a clear header row
- THEN the system SHALL skip column correction
- AND use original PP-Structure column assignments
- AND log that header-anchor correction was skipped
Scenario: Log column corrections
- WHEN a cell's column index is corrected
- THEN the system SHALL log original and corrected column indices
- AND include cell content snippet for debugging
- AND record total corrections per table
Requirement: Vertical Text Fragment Merging
The system SHALL detect and merge vertically fragmented Chinese text blocks that represent single cells spanning multiple rows.
Scenario: Detect vertical text fragments
- WHEN processing table text regions
- THEN the system SHALL identify narrow text blocks (width/height ratio < 0.3)
- AND filter blocks in leftmost 15% of table area
- AND group vertically adjacent blocks with X-center deviation < 10px
Scenario: Merge fragmented vertical text
- WHEN vertical text fragments are detected
- THEN the system SHALL merge adjacent fragments into single text blocks
- AND combine text content preserving reading order
- AND calculate merged bounding box spanning all fragments
- AND treat merged block as single cell for column assignment
Scenario: Preserve non-vertical text
- WHEN text blocks do not meet vertical fragment criteria
- THEN the system SHALL preserve original text block boundaries
- AND process normally without merging
MODIFIED Requirements
Requirement: Extract table structure
The system SHALL extract cell content and boundaries from PP-StructureV3 tables, with post-processing correction for column alignment errors.
Scenario: Extract table structure with correction
- WHEN PP-StructureV3 identifies a table
- THEN the system SHALL extract cell content and boundaries
- AND validate cell_boxes coordinates against page boundaries
- AND apply header-anchor column correction when enabled
- AND merge vertical text fragments when enabled
- AND apply fallback detection for invalid coordinates
- AND preserve table HTML for structure
- AND extract plain text for translation