# Design: Add Translated PDF Export ## Context The Tool_OCR project has implemented document translation using DIFY AI API, producing JSON files with translated content mapped by element_id. The existing PDF generator (`PDFGeneratorService`) can generate layout-preserving PDFs from UnifiedDocument but has no translation support. **Key Constraint**: The PDF generator uses element_id to position content. Translation JSON uses the same element_id mapping, making merging straightforward. ## Goals / Non-Goals **Goals:** - Generate PDF with translated text preserving original layout - Support all processing tracks (DIRECT, OCR, HYBRID) - Maintain backward compatibility with existing PDF export - Support table cell translation rendering **Non-Goals:** - Font optimization for target language scripts - Interactive editing of translations - Bilingual PDF output (original + translated side-by-side) ## Decisions ### Decision 1: Translation Merge Strategy **What**: Merge translation data into UnifiedDocument in-memory before PDF generation. **Why**: This approach: - Reuses existing PDF rendering logic unchanged - Keeps translation and PDF generation decoupled - Allows easy testing of merged document **Implementation**: ```python def apply_translations( unified_doc: UnifiedDocument, translations: Dict[str, Any] ) -> UnifiedDocument: """Apply translations to UnifiedDocument, returning modified copy""" doc_copy = unified_doc.copy(deep=True) for page in doc_copy.pages: for element in page.elements: if element.element_id in translations: translation = translations[element.element_id] if isinstance(translation, str): element.content = translation elif isinstance(translation, dict) and 'cells' in translation: # Handle table cells apply_table_translation(element, translation) return doc_copy ``` **Alternatives considered**: - Modify PDF generator to accept translations directly - Would require significant refactoring - Generate overlay PDF with translations - Complex positioning logic ### Decision 2: API Endpoint Design **What**: Add `POST /api/v2/translate/{task_id}/pdf?lang={target_lang}` endpoint. **Why**: - Consistent with existing `/translate/{task_id}` pattern - POST allows future expansion for PDF options - Clear separation from existing `/download/pdf` endpoint **Response**: Binary PDF file with `application/pdf` content-type. ### Decision 3: Frontend Integration **What**: Add conditional "Download Translated PDF" button in TaskDetailPage. **Why**: - Only show when translation is complete - Use existing download pattern from PDF export ## Risks / Trade-offs | Risk | Mitigation | |------|------------| | Large documents may timeout | Use existing async pattern, add progress tracking | | Font rendering for CJK scripts | Rely on existing NotoSansSC font registration | | Translation missing for some elements | Use original content as fallback | ## Migration Plan No migration needed - additive feature only. ## Open Questions 1. Should we support downloading multiple translated PDFs in batch? 2. Should translated PDF filename include source language as well as target?