# Tasks: Fix OCR Track Translation ## 1. Modify Translation Service - [x] 1.1 Add processing track detection - File: `backend/app/services/translation_service.py` - Location: `translate_document` method - Read `metadata.processing_track` from result JSON - Pass track type to extraction method - [x] 1.2 Create helper to load raw OCR regions - File: `backend/app/services/translation_service.py` - Function: `_load_raw_ocr_regions(result_dir, task_id, page_num)` - Pattern: `{task_id}_*_page_{page_num}_raw_ocr_regions.json` - Return: List of text regions with index and content - [x] 1.3 Modify extract_translatable_elements for OCR Track - File: `backend/app/services/translation_service.py` - Added: `extract_translatable_elements_ocr_track` method - Added parameters: `result_dir: Path`, `task_id: str` - For OCR Track: Extract from raw_ocr_regions.json - For Direct Track: Keep existing element-based extraction - [x] 1.4 Update translation result format - File: `backend/app/services/translation_service.py` - Location: `build_translation_result` method - Added `processing_track` parameter - For OCR Track: Output `raw_ocr_translations` field - Structure: `[{"page": 1, "index": 0, "original": "...", "translated": "..."}]` ## 2. Modify PDF Generation - [x] 2.1 Update generate_translated_pdf for OCR Track - File: `backend/app/services/pdf_generator_service.py` - Detect `processing_track` and `raw_ocr_translations` from translation JSON - For OCR Track: Call `_generate_translated_pdf_ocr_track` - For Direct Track: Continue using `apply_translations` (element-based) - [x] 2.2 Create helper to apply raw OCR translations - File: `backend/app/services/pdf_generator_service.py` - Function: `_generate_translated_pdf_ocr_track` - Build translation lookup: `{(page, index): translated_text}` - Load raw OCR regions, sort by Y coordinate - Render translated text with original fallback ## 3. Additional Fixes - [x] 3.1 Add page_number to TranslatedItem - File: `backend/app/schemas/translation.py` - Added `page_number: int = 1` to TranslatedItem dataclass - Updated `translate_batch` and `translate_item` to pass page_number - [x] 3.2 Update API endpoint validation - File: `backend/app/routers/translate.py` - Check for both `translations` (Direct Track) and `raw_ocr_translations` (OCR Track) - [x] 3.3 Filter text overlapping with images - File: `backend/app/services/pdf_generator_service.py` - Added `_collect_exclusion_zones`, `_is_region_overlapping_exclusion`, `_filter_regions_by_exclusion` - Applied filtering in `generate_reflow_pdf` and `_generate_translated_pdf_ocr_track` ## 4. Testing - [x] 4.1 Test OCR Track translation - Test with: `f8265449-6cb7-425d-a213-5d2e1af73955` - Verify: All 59 text blocks are sent for translation - Verify: Translation JSON contains `raw_ocr_translations` - [x] 4.2 Test OCR Track translated PDF - Generate translated reflow PDF - Verify: All translated text blocks appear correctly - Verify: Text inside images (like EWsenel) is filtered out - [x] 4.3 Test Direct Track unchanged - Verify: Translation still uses element-based approach - Verify: No regression in Direct Track flow