# Tasks: Fix OCR Track Reflow PDF ## 1. Modify generate_reflow_pdf Method - [x] 1.1 Add processing track detection - File: `backend/app/services/pdf_generator_service.py` - Location: `generate_reflow_pdf` method (line ~4704) - Read `metadata.processing_track` from JSON data - Branch logic based on track type - [x] 1.2 Add helper function to load raw OCR regions - File: `backend/app/services/pdf_generator_service.py` - Using existing: `load_raw_ocr_regions` from `text_region_renderer.py` - Pattern: `{task_id}_*_page_{page_num}_raw_ocr_regions.json` - Return: List of text regions with bbox and content - [x] 1.3 Implement OCR Track reflow rendering - File: `backend/app/services/pdf_generator_service.py` - For OCR Track: Load raw OCR regions per page - Sort text blocks by Y coordinate (top to bottom reading order) - Render text blocks as paragraphs - Still render images/charts from elements - [x] 1.4 Keep Direct Track logic unchanged - File: `backend/app/services/pdf_generator_service.py` - Direct Track continues using `content.cells` for tables - Extracted to `_render_reflow_elements` helper method - No changes to existing Direct Track flow ## 2. Handle Multi-page Documents - [x] 2.1 Support per-page raw OCR files - Pattern: `{task_id}_*_page_{page_num}_raw_ocr_regions.json` - Iterate through pages and load corresponding raw OCR file - Handle missing files gracefully (fall back to elements) ## 3. Testing - [x] 3.1 Test OCR Track reflow PDF - Test with: `a9259180-fc49-4890-8184-2e6d5f4edad3` (scan document) - Verify: All 59 text blocks appear in reflow PDF - Verify: Images are embedded correctly - [x] 3.2 Test Direct Track reflow PDF - Test with: `1b32428d-0609-4cfd-bc52-56be6956ac2e` (editable PDF) - Verify: Tables render with cells - Verify: No regression from changes - [x] 3.3 Test translated reflow PDF - Test: Complete translation then download reflow PDF - Verify: Translated text appears correctly