fix: OCR Track reflow PDF and translation with image text filtering

- Add OCR Track support for reflow PDF generation using raw_ocr_regions.json
- Add OCR Track translation extraction from raw_ocr_regions instead of elements
- Add raw_ocr_translations output format for OCR Track documents
- Add exclusion zone filtering to remove text overlapping with images
- Update API validation to accept both translations and raw_ocr_translations
- Add page_number field to TranslatedItem for proper tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
egg
2025-12-12 11:02:35 +08:00
parent 24253ac15e
commit 1f18010040
11 changed files with 1040 additions and 149 deletions

View File

@@ -146,6 +146,7 @@ class TranslatedItem:
original_content: str
translated_content: str
element_type: str
page_number: int = 1
cell_position: Optional[Tuple[int, int]] = None