# Change: Fix OCR Track Translation ## Why OCR Track translation is missing most content because: 1. Translation service (`extract_translatable_elements`) only processes elements from `scan_result.json` 2. OCR Track tables have `content: ""` (empty string) - no `content.cells` data 3. All table text exists in `raw_ocr_regions.json` (59 text blocks) but translation service ignores it 4. Result: Only 6 text elements translated vs 59 raw OCR regions available **Current Data Flow (OCR Track):** ``` scan_result.json (10 elements, 6 text, 2 empty tables) → Translation extracts 6 text items → 53 text blocks in tables are NOT translated ``` **Expected Data Flow (OCR Track):** ``` raw_ocr_regions.json (59 text blocks) → Translation extracts ALL 59 text items → Complete translation coverage ``` ## What Changes ### 1. Translation Service Enhancement Modify `translate_document` in `translation_service.py` to: 1. **Detect processing track** from result JSON metadata 2. **For OCR Track**: Load and translate `raw_ocr_regions.json` instead of elements 3. **For Direct Track**: Continue using elements with `content.cells` (already works) ### 2. Translation Result Format for OCR Track Add new field `raw_ocr_translations` to translation JSON for OCR Track: ```json { "translations": { ... }, // element-based (for Direct Track) "raw_ocr_translations": [ // NEW: for OCR Track { "index": 0, "original": "华天科技(宝鸡)有限公司", "translated": "Huatian Technology (Baoji) Co., Ltd." }, ... ] } ``` ### 3. Translated PDF Generation Modify `generate_translated_pdf` to use `raw_ocr_translations` when available for OCR Track documents. ## Impact - **Affected files**: - `backend/app/services/translation_service.py` - extraction and translation logic - `backend/app/services/pdf_generator_service.py` - translated PDF rendering - **User experience**: OCR Track translations will include ALL text content - **API**: Translation JSON format extended (backward compatible) ## Migration - No data migration required - Existing translations continue to work (Direct Track unaffected) - Re-translation needed for OCR Track documents to get full coverage