feat: add translated PDF export with layout preservation
Adds the ability to download translated documents as PDF files while
preserving the original document layout. Key changes:
- Add apply_translations() function to merge translation JSON with UnifiedDocument
- Add generate_translated_pdf() method to PDFGeneratorService
- Add POST /api/v2/translate/{task_id}/pdf endpoint
- Add downloadTranslatedPdf() method and PDF button in frontend
- Add comprehensive unit tests (52 tests: merge, PDF generation, API endpoints)
- Archive add-translated-pdf-export proposal
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,91 @@
|
||||
# Design: Add Translated PDF Export
|
||||
|
||||
## Context
|
||||
|
||||
The Tool_OCR project has implemented document translation using DIFY AI API, producing JSON files with translated content mapped by element_id. The existing PDF generator (`PDFGeneratorService`) can generate layout-preserving PDFs from UnifiedDocument but has no translation support.
|
||||
|
||||
**Key Constraint**: The PDF generator uses element_id to position content. Translation JSON uses the same element_id mapping, making merging straightforward.
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- Generate PDF with translated text preserving original layout
|
||||
- Support all processing tracks (DIRECT, OCR, HYBRID)
|
||||
- Maintain backward compatibility with existing PDF export
|
||||
- Support table cell translation rendering
|
||||
|
||||
**Non-Goals:**
|
||||
- Font optimization for target language scripts
|
||||
- Interactive editing of translations
|
||||
- Bilingual PDF output (original + translated side-by-side)
|
||||
|
||||
## Decisions
|
||||
|
||||
### Decision 1: Translation Merge Strategy
|
||||
|
||||
**What**: Merge translation data into UnifiedDocument in-memory before PDF generation.
|
||||
|
||||
**Why**: This approach:
|
||||
- Reuses existing PDF rendering logic unchanged
|
||||
- Keeps translation and PDF generation decoupled
|
||||
- Allows easy testing of merged document
|
||||
|
||||
**Implementation**:
|
||||
```python
|
||||
def apply_translations(
|
||||
unified_doc: UnifiedDocument,
|
||||
translations: Dict[str, Any]
|
||||
) -> UnifiedDocument:
|
||||
"""Apply translations to UnifiedDocument, returning modified copy"""
|
||||
doc_copy = unified_doc.copy(deep=True)
|
||||
for page in doc_copy.pages:
|
||||
for element in page.elements:
|
||||
if element.element_id in translations:
|
||||
translation = translations[element.element_id]
|
||||
if isinstance(translation, str):
|
||||
element.content = translation
|
||||
elif isinstance(translation, dict) and 'cells' in translation:
|
||||
# Handle table cells
|
||||
apply_table_translation(element, translation)
|
||||
return doc_copy
|
||||
```
|
||||
|
||||
**Alternatives considered**:
|
||||
- Modify PDF generator to accept translations directly - Would require significant refactoring
|
||||
- Generate overlay PDF with translations - Complex positioning logic
|
||||
|
||||
### Decision 2: API Endpoint Design
|
||||
|
||||
**What**: Add `POST /api/v2/translate/{task_id}/pdf?lang={target_lang}` endpoint.
|
||||
|
||||
**Why**:
|
||||
- Consistent with existing `/translate/{task_id}` pattern
|
||||
- POST allows future expansion for PDF options
|
||||
- Clear separation from existing `/download/pdf` endpoint
|
||||
|
||||
**Response**: Binary PDF file with `application/pdf` content-type.
|
||||
|
||||
### Decision 3: Frontend Integration
|
||||
|
||||
**What**: Add conditional "Download Translated PDF" button in TaskDetailPage.
|
||||
|
||||
**Why**:
|
||||
- Only show when translation is complete
|
||||
- Use existing download pattern from PDF export
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| Large documents may timeout | Use existing async pattern, add progress tracking |
|
||||
| Font rendering for CJK scripts | Rely on existing NotoSansSC font registration |
|
||||
| Translation missing for some elements | Use original content as fallback |
|
||||
|
||||
## Migration Plan
|
||||
|
||||
No migration needed - additive feature only.
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. Should we support downloading multiple translated PDFs in batch?
|
||||
2. Should translated PDF filename include source language as well as target?
|
||||
Reference in New Issue
Block a user