Backup commit before executing remove-unused-code proposal. This includes all pending changes and new features. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
65 lines
2.4 KiB
Markdown
65 lines
2.4 KiB
Markdown
## ADDED Requirements
|
|
|
|
### Requirement: Cell Over-Detection Filtering
|
|
|
|
The system SHALL validate PP-StructureV3 table detections using metric-based heuristics to filter over-detected cells.
|
|
|
|
#### Scenario: Cell density exceeds threshold
|
|
- **GIVEN** a table detected by PP-StructureV3 with cell_boxes
|
|
- **WHEN** cell density exceeds 3.0 cells per 10,000 px²
|
|
- **THEN** the system SHALL flag the table as over-detected
|
|
- **AND** reclassify the table as a TEXT element
|
|
|
|
#### Scenario: Average cell area below threshold
|
|
- **GIVEN** a table detected by PP-StructureV3
|
|
- **WHEN** average cell area is less than 3,000 px²
|
|
- **THEN** the system SHALL flag the table as over-detected
|
|
- **AND** reclassify the table as a TEXT element
|
|
|
|
#### Scenario: Cell height too small
|
|
- **GIVEN** a table with height H and N cells
|
|
- **WHEN** (H / N) is less than 10 pixels
|
|
- **THEN** the system SHALL flag the table as over-detected
|
|
- **AND** reclassify the table as a TEXT element
|
|
|
|
#### Scenario: Valid tables are preserved
|
|
- **GIVEN** a table with normal metrics (density < 3.0, avg area > 3000, height/N > 10)
|
|
- **WHEN** validation is applied
|
|
- **THEN** the table SHALL be preserved unchanged
|
|
- **AND** all cell_boxes SHALL be retained
|
|
|
|
### Requirement: Table-to-Text Reclassification
|
|
|
|
The system SHALL convert over-detected tables to TEXT elements while preserving content.
|
|
|
|
#### Scenario: Table content is preserved
|
|
- **GIVEN** a table flagged for reclassification
|
|
- **WHEN** converting to TEXT element
|
|
- **THEN** the system SHALL extract text content from table HTML
|
|
- **AND** preserve the original bounding box
|
|
- **AND** set element type to TEXT
|
|
|
|
#### Scenario: Reading order is recalculated
|
|
- **GIVEN** tables have been reclassified as TEXT
|
|
- **WHEN** assembling the final page structure
|
|
- **THEN** the system SHALL recalculate reading order
|
|
- **AND** sort elements by y0 then x0 coordinates
|
|
|
|
### Requirement: Validation Configuration
|
|
|
|
The system SHALL provide configurable thresholds for cell validation.
|
|
|
|
#### Scenario: Default thresholds are applied
|
|
- **GIVEN** no custom configuration is provided
|
|
- **WHEN** validating tables
|
|
- **THEN** the system SHALL use default thresholds:
|
|
- max_cell_density: 3.0 cells/10000px²
|
|
- min_avg_cell_area: 3000 px²
|
|
- min_cell_height: 10 px
|
|
|
|
#### Scenario: Custom thresholds can be configured
|
|
- **GIVEN** custom validation thresholds in configuration
|
|
- **WHEN** validating tables
|
|
- **THEN** the system SHALL use the custom values
|
|
- **AND** apply them consistently to all pages
|