# Tasks: Upgrade PP-StructureV3 Models ## 1. Backend Configuration Changes - [x] 1.1 Update `backend/app/core/config.py` - Enable preprocessing flags - Set `use_doc_orientation_classify` default to True - Set `use_doc_unwarping` default to True - Set `use_textline_orientation` default to True - Add `table_structure_model_name` configuration - Add `formula_recognition_model_name` configuration - [x] 1.2 Update `backend/app/services/ocr_service.py` - Model mapping changes - Update `LAYOUT_MODEL_MAPPING`: - Change `"chinese"` from `"PP-DocLayout-S"` to `"PP-DocLayout_plus-L"` - Keep `"default"` as PubLayNet - Keep `"cdla"` as is - Update `_ensure_structure_engine()`: - Pass preprocessing flags to PPStructureV3 - Configure SLANeXt models for table recognition - Configure PP-FormulaNet_plus-L for formula recognition - [x] 1.3 Update PPStructureV3 initialization kwargs - Add `table_structure_model_name="SLANeXt_wired"` (or configure dual model) - Add `formula_recognition_model_name="PP-FormulaNet_plus-L"` - Verify preprocessing flags are passed correctly ## 2. Schema Updates - [x] 2.1 Update `backend/app/schemas/task.py` - LayoutModelEnum - Rename or update `CHINESE` description to reflect PP-DocLayout_plus-L - Update docstrings to reflect new model capabilities ## 3. Frontend Updates - [x] 3.1 Update `frontend/src/components/LayoutModelSelector.tsx` - Update Chinese option description to mention PP-DocLayout_plus-L - Update accuracy information displayed to users - [x] 3.2 Update `frontend/src/i18n/locales/zh-TW.json` - Update `layoutModel.chinese.description` to reflect new model - Update any accuracy percentages in descriptions ## 4. Testing - [x] 4.1 Create unit tests for new model configuration - Test preprocessing flags are correctly passed - Test model mapping resolves correctly - Test engine initialization with new models - [ ] 4.2 Integration testing with real documents - Test rotated document handling (preprocessing) - Test complex Chinese document layout detection - Test table structure recognition accuracy - Test formula recognition with Chinese formulas - [x] 4.3 Update existing tests - Update `backend/tests/services/test_layout_model.py` for new mapping - Update `backend/tests/api/test_layout_model_api.py` if needed ## 5. Documentation - [x] 5.1 Create model cleanup documentation - Document `~/.paddlex/official_models/` cache location - List models that can be safely deleted after upgrade - Provide cleanup script/commands - See: [MODEL_CLEANUP.md](./MODEL_CLEANUP.md) - [x] 5.2 Update API documentation - Document preprocessing feature behavior - Update layout model descriptions ## 6. Verification & Deployment - [ ] 6.1 Verify new models download correctly on first use - [ ] 6.2 Measure memory/GPU usage with new models - [ ] 6.3 Compare processing speed before/after upgrade - [ ] 6.4 Verify existing functionality not broken