- Enable PP-StructureV3's use_doc_orientation_classify feature - Detect rotation angle from doc_preprocessor_res.angle - Swap page dimensions (width <-> height) for 90°/270° rotations - Output PDF now correctly displays landscape-scanned content Also includes: - Archive completed openspec proposals - Add simplify-frontend-ocr-config proposal (pending) - Code cleanup and frontend simplification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.2 KiB
4.2 KiB
MODIFIED Requirements
Requirement: Enhanced OCR with Full PP-StructureV3
The system SHALL utilize the full capabilities of PP-StructureV3, extracting all element types from parsing_res_list, with proper handling of visual elements and table coordinates.
Scenario: Extract comprehensive document structure
- WHEN processing through OCR track
- THEN the system SHALL use page_result.json['parsing_res_list']
- AND extract all element types including headers, lists, tables, figures
- AND preserve layout_bbox coordinates for each element
Scenario: Maintain reading order
- WHEN extracting elements from PP-StructureV3
- THEN the system SHALL preserve the reading order from parsing_res_list
- AND assign sequential indices to elements
- AND support reordering for complex layouts
Scenario: Extract table structure with HTML content
- WHEN PP-StructureV3 identifies a table
- THEN the system SHALL extract cell content and boundaries from table_res_list
- AND extract pred_html for table HTML content
- AND validate cell_boxes coordinates against page boundaries
- AND apply fallback detection for invalid coordinates
- AND preserve table HTML for structure
- AND extract plain text for translation
Scenario: Table matching via bbox overlap
- GIVEN a table element from parsing_res_list without direct HTML content
- WHEN matching against table_res_list using bbox overlap
- AND overlap ratio exceeds 10%
- THEN the system SHALL extract both cell_box_list and pred_html from the matched table_res
- AND set element['html'] to the extracted pred_html
- AND set element['extracted_text'] from the HTML content
- AND log the successful extraction
Scenario: Extract visual elements with paths
- WHEN PP-StructureV3 identifies visual elements (IMAGE, FIGURE, CHART, DIAGRAM)
- THEN the system SHALL preserve saved_path for each element
- AND include image dimensions and format
- AND enable image embedding in output PDF
ADDED Requirements
Requirement: OCR Track PDF Coordinate System
The system SHALL generate PDF output for OCR Track using the OCR coordinate system dimensions to ensure accurate text sizing and positioning.
Scenario: PDF page size matches OCR coordinate system
- GIVEN an OCR track processing task
- WHEN generating the output PDF
- THEN the system SHALL use the OCR image dimensions as PDF page size
- AND set scale factors to 1.0 (no scaling)
- AND preserve original bbox coordinates without transformation
Scenario: Text font size calculation without scaling
- GIVEN a text element with bbox height H in OCR coordinates
- WHEN rendering text in PDF
- THEN the system SHALL calculate font size based directly on bbox height
- AND NOT apply additional scaling factors
- AND ensure readable text output
Scenario: Direct Track PDF maintains original size
- GIVEN a direct track processing task
- WHEN generating the output PDF
- THEN the system SHALL use the original PDF page dimensions
- AND preserve existing coordinate transformation logic
- AND NOT be affected by OCR Track coordinate changes
Requirement: Table Cell Quality Assessment
The system SHALL assess table cell_boxes quality with appropriate thresholds to avoid filtering valid tables.
Scenario: Cell density threshold
- GIVEN a table with cell_boxes from PP-StructureV3
- WHEN cell density exceeds 5.0 cells per 10,000 px²
- THEN the system SHALL flag the table as potentially over-detected
- AND log the specific density value for debugging
Scenario: Average cell area threshold
- GIVEN a table with cell_boxes
- WHEN average cell area is less than 2,000 px²
- THEN the system SHALL flag the table as potentially over-detected
- AND log the specific area value for debugging
Scenario: Valid tables with normal metrics
- GIVEN a table with density < 5.0 cells/10000px² and avg area > 2000px²
- WHEN quality assessment is applied
- THEN the table SHALL be considered valid
- AND cell_boxes SHALL be used for rendering
- AND table content SHALL be displayed in PDF output