## ADDED Requirements ### Requirement: Table Column Alignment Correction The system SHALL correct table cell column assignments using header-anchor alignment when PP-Structure outputs incorrect column indices. #### Scenario: Correct column shift using header anchors - **WHEN** processing a table with cell_boxes and HTML content - **THEN** the system SHALL extract header row (row_idx=0) column X-coordinate ranges - **AND** validate each cell's column assignment against header X-ranges - **AND** correct column index if cell X-overlap with assigned column is < 50% - **AND** assign cell to column with highest X-overlap #### Scenario: Handle tables without headers - **WHEN** processing a table without a clear header row - **THEN** the system SHALL skip column correction - **AND** use original PP-Structure column assignments - **AND** log that header-anchor correction was skipped #### Scenario: Log column corrections - **WHEN** a cell's column index is corrected - **THEN** the system SHALL log original and corrected column indices - **AND** include cell content snippet for debugging - **AND** record total corrections per table ### Requirement: Vertical Text Fragment Merging The system SHALL detect and merge vertically fragmented Chinese text blocks that represent single cells spanning multiple rows. #### Scenario: Detect vertical text fragments - **WHEN** processing table text regions - **THEN** the system SHALL identify narrow text blocks (width/height ratio < 0.3) - **AND** filter blocks in leftmost 15% of table area - **AND** group vertically adjacent blocks with X-center deviation < 10px #### Scenario: Merge fragmented vertical text - **WHEN** vertical text fragments are detected - **THEN** the system SHALL merge adjacent fragments into single text blocks - **AND** combine text content preserving reading order - **AND** calculate merged bounding box spanning all fragments - **AND** treat merged block as single cell for column assignment #### Scenario: Preserve non-vertical text - **WHEN** text blocks do not meet vertical fragment criteria - **THEN** the system SHALL preserve original text block boundaries - **AND** process normally without merging ## MODIFIED Requirements ### Requirement: Extract table structure The system SHALL extract cell content and boundaries from PP-StructureV3 tables, with post-processing correction for column alignment errors. #### Scenario: Extract table structure with correction - **WHEN** PP-StructureV3 identifies a table - **THEN** the system SHALL extract cell content and boundaries - **AND** validate cell_boxes coordinates against page boundaries - **AND** apply header-anchor column correction when enabled - **AND** merge vertical text fragments when enabled - **AND** apply fallback detection for invalid coordinates - **AND** preserve table HTML for structure - **AND** extract plain text for translation