OCR/spec.md at c12ea0b9f60cf1c095483d8aca5cc01d12ddfbfa

egg/OCR

Files

egg 5448a047ff chore: archive upgrade-ppstructure-models proposal

Archived as 2025-11-27-upgrade-ppstructure-models
Spec updated: ocr-processing (added PP-StructureV3 Configuration)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-27 14:22:33 +08:00

2.7 KiB

Raw Blame History

ADDED Requirements

Requirement: PP-StructureV3 Configuration

The system SHALL configure PP-StructureV3 with the following settings:

Preprocessing (Stage 1):

Document orientation classification MUST be enabled (use_doc_orientation_classify=True)
Document unwarping MUST be enabled (use_doc_unwarping=True)
Textline orientation detection MUST be enabled (use_textline_orientation=True)

Layout Detection (Stage 3):

The chinese layout model option SHALL use PP-DocLayout_plus-L (83.2% mAP)
The default layout model option SHALL use PubLayNet for English documents
The cdla layout model option SHALL use picodet_lcnet_x1_0_fgd_layout_cdla

Element Recognition (Stage 4):

Table structure recognition SHALL use SLANeXt_wired and SLANeXt_wireless models (69.65% combined accuracy)
Formula recognition SHALL use PP-FormulaNet_plus-L (92.22% English, 90.64% Chinese BLEU)
Chart parsing SHALL use PP-Chart2Table
Seal recognition SHALL use PP-OCRv4_seal

Scenario: Processing rotated scanned document

WHEN a PDF document with rotated pages is processed using OCR track
THEN the system SHALL automatically detect and correct the orientation before OCR processing

Scenario: Processing complex Chinese document with tables

WHEN a Chinese document containing tables, images, and formulas is processed
AND the user selects "chinese" layout model
THEN the system SHALL use PP-DocLayout_plus-L for layout detection (83.2% mAP)
AND the system SHALL correctly identify table regions

Scenario: Table structure recognition with wired tables

WHEN a document contains wired (bordered) tables
THEN the system SHALL use SLANeXt_wired model for structure recognition
AND output correct HTML table structure with proper row/column spanning

Scenario: Table structure recognition with wireless tables

WHEN a document contains wireless (borderless) tables
THEN the system SHALL use SLANeXt_wireless model for structure recognition

Scenario: Chinese formula recognition

WHEN a document contains mathematical formulas with Chinese characters
THEN the system SHALL use PP-FormulaNet_plus-L for recognition
AND output LaTeX code with correct Chinese character representation

ADDED Requirements

Requirement: Model Cache Cleanup

The system SHALL provide documentation for cleaning up unused model caches to optimize storage space.

Scenario: User wants to free disk space after model upgrade

WHEN the user has upgraded from older models (PP-DocLayout-S, SLANet) to newer models
THEN the documentation SHALL explain how to delete unused cached models from ~/.paddlex/official_models/
AND list which model directories can be safely removed

2.7 KiB Raw Blame History

ADDED Requirements

Requirement: PP-StructureV3 Configuration

Scenario: Processing rotated scanned document

Scenario: Processing complex Chinese document with tables

Scenario: Table structure recognition with wired tables

Scenario: Table structure recognition with wireless tables

Scenario: Chinese formula recognition

ADDED Requirements

Requirement: Model Cache Cleanup

Scenario: User wants to free disk space after model upgrade

2.7 KiB

Raw Blame History