## ADDED Requirements

### Requirement: OCR Track Table Data Structure Consistency
The OCR Track SHALL produce `TableData` objects with fully populated `cells` arrays that match the format produced by Direct Track, ensuring consistent table rendering across both processing tracks.

#### Scenario: OCR Track produces structured TableData for HTML tables
- **GIVEN** a document with tables is processed via OCR Track
- **WHEN** PP-StructureV3 returns HTML table content in the `html` or `content` field
- **THEN** the `ocr_to_unified_converter` SHALL parse the HTML and produce a `TableData` object
- **AND** the `TableData.cells` array SHALL contain `TableCell` objects for each cell
- **AND** each `TableCell` SHALL have correct `row`, `col`, and `content` values
- **AND** the output format SHALL match Direct Track's `TableData` structure

#### Scenario: OCR Track handles tables with merged cells
- **GIVEN** an HTML table with `rowspan` or `colspan` attributes
- **WHEN** the table is converted to `TableData`
- **THEN** each `TableCell` SHALL have correct `row_span` and `col_span` values
- **AND** the cell content SHALL be correctly extracted

#### Scenario: OCR Track handles header rows
- **GIVEN** an HTML table with `<th>` elements or a header row
- **WHEN** the table is converted to `TableData`
- **THEN** the `TableData.headers` field SHALL contain the header cell contents
- **AND** header cells SHALL also be included in the `cells` array

#### Scenario: OCR Track gracefully handles malformed HTML tables
- **GIVEN** an HTML table with malformed markup (missing closing tags, invalid nesting)
- **WHEN** parsing is attempted
- **THEN** the system SHALL attempt best-effort parsing using a tolerant HTML parser
- **AND** if parsing fails completely, SHALL fall back to returning basic `TableData` with row/col counts
- **AND** SHALL log a warning for debugging purposes

#### Scenario: PDF Generator renders OCR Track tables correctly
- **GIVEN** a `UnifiedDocument` from OCR Track containing table elements
- **WHEN** the PDF Generator processes the document
- **THEN** tables SHALL be rendered as formatted tables (not as raw HTML text)
- **AND** the rendering SHALL be identical to Direct Track table rendering

#### Scenario: Direct Track table processing remains unchanged
- **GIVEN** a native PDF with embedded tables
- **WHEN** the document is processed via Direct Track
- **THEN** the `DirectExtractionEngine` SHALL continue to produce `TableData` objects as before
- **AND** the `ocr_to_unified_converter.py` changes SHALL NOT affect Direct Track processing
- **AND** table rendering in PDF output SHALL be identical to pre-fix behavior

#### Scenario: Hybrid Mode table source isolation
- **GIVEN** a document processed via Hybrid Mode (Direct Track primary + OCR Track for images)
- **WHEN** the system merges OCR Track results into Direct Track results
- **THEN** only image elements (FIGURE, IMAGE, LOGO) SHALL be merged from OCR Track
- **AND** table elements SHALL exclusively come from Direct Track
- **AND** no OCR Track table data SHALL contaminate the final output