feat: add translated PDF export with layout preservation

Adds the ability to download translated documents as PDF files while
preserving the original document layout. Key changes:

- Add apply_translations() function to merge translation JSON with UnifiedDocument
- Add generate_translated_pdf() method to PDFGeneratorService
- Add POST /api/v2/translate/{task_id}/pdf endpoint
- Add downloadTranslatedPdf() method and PDF button in frontend
- Add comprehensive unit tests (52 tests: merge, PDF generation, API endpoints)
- Archive add-translated-pdf-export proposal

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-12-02 12:33:31 +08:00
parent 8d9b69ba93
commit a07aad96b3
15 changed files with 2663 additions and 2 deletions

View File

@@ -186,3 +186,74 @@ The system SHALL support common languages through DIFY AI service.
- Vietnamese (vi)
- Thai (th)
### Requirement: Translated PDF Generation
The system SHALL support generating PDF files with translated content while preserving the original document layout.
#### Scenario: Generate translated PDF from Direct track document
- **GIVEN** a completed translation for a Direct track processed document
- **WHEN** user requests translated PDF via `POST /api/v2/translate/{task_id}/pdf?lang={target_lang}`
- **THEN** the system loads the translation JSON file
- **AND** merges translations with UnifiedDocument by element_id
- **AND** generates PDF with translated text at original positions
- **AND** returns PDF file with Content-Type `application/pdf`
#### Scenario: Generate translated PDF from OCR track document
- **GIVEN** a completed translation for an OCR track processed document
- **WHEN** user requests translated PDF
- **THEN** the system generates PDF preserving all OCR layout information
- **AND** replaces original text with translated content
- **AND** maintains table structure with translated cell content
#### Scenario: Handle missing translations gracefully
- **GIVEN** a translation JSON missing some element_id entries
- **WHEN** generating translated PDF
- **THEN** the system uses original content for missing translations
- **AND** logs warning for each fallback
- **AND** completes PDF generation successfully
#### Scenario: Translated PDF for incomplete translation
- **GIVEN** a task with translation status "pending" or "translating"
- **WHEN** user requests translated PDF
- **THEN** the system returns 400 Bad Request
- **AND** includes error message indicating translation not complete
#### Scenario: Translated PDF for non-existent translation
- **GIVEN** a task that has not been translated to requested language
- **WHEN** user requests translated PDF with `lang=fr`
- **THEN** the system returns 404 Not Found
- **AND** includes error message indicating no translation for language
---
### Requirement: Translation Merge Service
The system SHALL provide a service to merge translation data with UnifiedDocument.
#### Scenario: Merge text element translations
- **GIVEN** a UnifiedDocument with text elements
- **AND** a translation JSON with matching element_ids
- **WHEN** applying translations
- **THEN** the system replaces content field for each matched element
- **AND** preserves all other element properties (bounding_box, style_info, etc.)
#### Scenario: Merge table cell translations
- **GIVEN** a UnifiedDocument containing table elements
- **AND** a translation JSON with table_cell translations like:
```json
{
"table_1_0": {
"cells": [{"row": 0, "col": 0, "content": "Translated"}]
}
}
```
- **WHEN** applying translations
- **THEN** the system updates cell content at matching row/col positions
- **AND** preserves cell structure and styling
#### Scenario: Non-destructive merge operation
- **GIVEN** a UnifiedDocument
- **WHEN** applying translations
- **THEN** the system creates a modified copy
- **AND** original UnifiedDocument remains unchanged