chore: backup before code cleanup
Backup commit before executing remove-unused-code proposal. This includes all pending changes and new features. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,59 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Table Column Alignment Correction
|
||||
The system SHALL correct table cell column assignments using header-anchor alignment when PP-Structure outputs incorrect column indices.
|
||||
|
||||
#### Scenario: Correct column shift using header anchors
|
||||
- **WHEN** processing a table with cell_boxes and HTML content
|
||||
- **THEN** the system SHALL extract header row (row_idx=0) column X-coordinate ranges
|
||||
- **AND** validate each cell's column assignment against header X-ranges
|
||||
- **AND** correct column index if cell X-overlap with assigned column is < 50%
|
||||
- **AND** assign cell to column with highest X-overlap
|
||||
|
||||
#### Scenario: Handle tables without headers
|
||||
- **WHEN** processing a table without a clear header row
|
||||
- **THEN** the system SHALL skip column correction
|
||||
- **AND** use original PP-Structure column assignments
|
||||
- **AND** log that header-anchor correction was skipped
|
||||
|
||||
#### Scenario: Log column corrections
|
||||
- **WHEN** a cell's column index is corrected
|
||||
- **THEN** the system SHALL log original and corrected column indices
|
||||
- **AND** include cell content snippet for debugging
|
||||
- **AND** record total corrections per table
|
||||
|
||||
### Requirement: Vertical Text Fragment Merging
|
||||
The system SHALL detect and merge vertically fragmented Chinese text blocks that represent single cells spanning multiple rows.
|
||||
|
||||
#### Scenario: Detect vertical text fragments
|
||||
- **WHEN** processing table text regions
|
||||
- **THEN** the system SHALL identify narrow text blocks (width/height ratio < 0.3)
|
||||
- **AND** filter blocks in leftmost 15% of table area
|
||||
- **AND** group vertically adjacent blocks with X-center deviation < 10px
|
||||
|
||||
#### Scenario: Merge fragmented vertical text
|
||||
- **WHEN** vertical text fragments are detected
|
||||
- **THEN** the system SHALL merge adjacent fragments into single text blocks
|
||||
- **AND** combine text content preserving reading order
|
||||
- **AND** calculate merged bounding box spanning all fragments
|
||||
- **AND** treat merged block as single cell for column assignment
|
||||
|
||||
#### Scenario: Preserve non-vertical text
|
||||
- **WHEN** text blocks do not meet vertical fragment criteria
|
||||
- **THEN** the system SHALL preserve original text block boundaries
|
||||
- **AND** process normally without merging
|
||||
|
||||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: Extract table structure
|
||||
The system SHALL extract cell content and boundaries from PP-StructureV3 tables, with post-processing correction for column alignment errors.
|
||||
|
||||
#### Scenario: Extract table structure with correction
|
||||
- **WHEN** PP-StructureV3 identifies a table
|
||||
- **THEN** the system SHALL extract cell content and boundaries
|
||||
- **AND** validate cell_boxes coordinates against page boundaries
|
||||
- **AND** apply header-anchor column correction when enabled
|
||||
- **AND** merge vertical text fragments when enabled
|
||||
- **AND** apply fallback detection for invalid coordinates
|
||||
- **AND** preserve table HTML for structure
|
||||
- **AND** extract plain text for translation
|
||||
Reference in New Issue
Block a user