## ADDED Requirements ### Requirement: Cell Over-Detection Filtering The system SHALL validate PP-StructureV3 table detections using metric-based heuristics to filter over-detected cells. #### Scenario: Cell density exceeds threshold - **GIVEN** a table detected by PP-StructureV3 with cell_boxes - **WHEN** cell density exceeds 3.0 cells per 10,000 px² - **THEN** the system SHALL flag the table as over-detected - **AND** reclassify the table as a TEXT element #### Scenario: Average cell area below threshold - **GIVEN** a table detected by PP-StructureV3 - **WHEN** average cell area is less than 3,000 px² - **THEN** the system SHALL flag the table as over-detected - **AND** reclassify the table as a TEXT element #### Scenario: Cell height too small - **GIVEN** a table with height H and N cells - **WHEN** (H / N) is less than 10 pixels - **THEN** the system SHALL flag the table as over-detected - **AND** reclassify the table as a TEXT element #### Scenario: Valid tables are preserved - **GIVEN** a table with normal metrics (density < 3.0, avg area > 3000, height/N > 10) - **WHEN** validation is applied - **THEN** the table SHALL be preserved unchanged - **AND** all cell_boxes SHALL be retained ### Requirement: Table-to-Text Reclassification The system SHALL convert over-detected tables to TEXT elements while preserving content. #### Scenario: Table content is preserved - **GIVEN** a table flagged for reclassification - **WHEN** converting to TEXT element - **THEN** the system SHALL extract text content from table HTML - **AND** preserve the original bounding box - **AND** set element type to TEXT #### Scenario: Reading order is recalculated - **GIVEN** tables have been reclassified as TEXT - **WHEN** assembling the final page structure - **THEN** the system SHALL recalculate reading order - **AND** sort elements by y0 then x0 coordinates ### Requirement: Validation Configuration The system SHALL provide configurable thresholds for cell validation. #### Scenario: Default thresholds are applied - **GIVEN** no custom configuration is provided - **WHEN** validating tables - **THEN** the system SHALL use default thresholds: - max_cell_density: 3.0 cells/10000px² - min_avg_cell_area: 3000 px² - min_cell_height: 10 px #### Scenario: Custom thresholds can be configured - **GIVEN** custom validation thresholds in configuration - **WHEN** validating tables - **THEN** the system SHALL use the custom values - **AND** apply them consistently to all pages