feat: update PDF generator to support UnifiedDocument directly

- Add generate_from_unified_document() method for direct UnifiedDocument processing
- Create convert_unified_document_to_ocr_data() for format conversion
- Extract _generate_pdf_from_data() as reusable core logic
- Support both OCR and DIRECT processing tracks in PDF generation
- Handle coordinate transformations (BoundingBox to polygon format)
- Update OCR service to use appropriate PDF generation method

Completes Section 4 (Unified Processing Pipeline) of dual-track proposal.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-19 08:48:25 +08:00
parent ab89a40e8d
commit ecdce961ca
3 changed files with 341 additions and 138 deletions

View File

@@ -63,10 +63,10 @@
- [x] 4.2.1 Define standardized JSON schema
- [x] 4.2.2 Include processing metadata
- [x] 4.2.3 Support both track outputs
- [ ] 4.3 Update PDF generator for UnifiedDocument
- [ ] 4.3.1 Adapt PDF generation to use UnifiedDocument
- [ ] 4.3.2 Preserve layout from both tracks
- [ ] 4.3.3 Handle coordinate transformations
- [x] 4.3 Update PDF generator for UnifiedDocument
- [x] 4.3.1 Adapt PDF generation to use UnifiedDocument
- [x] 4.3.2 Preserve layout from both tracks
- [x] 4.3.3 Handle coordinate transformations
## 5. Translation System Foundation
- [ ] 5.1 Create TranslationEngine interface