feat: add translated PDF export with layout preservation

Adds the ability to download translated documents as PDF files while
preserving the original document layout. Key changes:

- Add apply_translations() function to merge translation JSON with UnifiedDocument
- Add generate_translated_pdf() method to PDFGeneratorService
- Add POST /api/v2/translate/{task_id}/pdf endpoint
- Add downloadTranslatedPdf() method and PDF button in frontend
- Add comprehensive unit tests (52 tests: merge, PDF generation, API endpoints)
- Archive add-translated-pdf-export proposal

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-12-02 12:33:31 +08:00
parent 8d9b69ba93
commit a07aad96b3
15 changed files with 2663 additions and 2 deletions

View File

@@ -0,0 +1,55 @@
## ADDED Requirements
### Requirement: Translated PDF Export API
The system SHALL expose an API endpoint for downloading translated documents as PDF files.
#### Scenario: Download translated PDF via API
- **GIVEN** a task with completed translation to English
- **WHEN** POST request to `/api/v2/translate/{task_id}/pdf?lang=en`
- **THEN** system returns PDF file with translated content
- **AND** Content-Type is `application/pdf`
- **AND** Content-Disposition suggests filename like `{task_id}_translated_en.pdf`
#### Scenario: Download translated PDF with layout preservation
- **WHEN** user downloads translated PDF
- **THEN** the PDF maintains original document layout
- **AND** text positions match original document coordinates
- **AND** images and tables appear at original positions
#### Scenario: Invalid language parameter
- **GIVEN** a task with translation only to English
- **WHEN** user requests PDF with `lang=ja` (Japanese)
- **THEN** system returns 404 Not Found
- **AND** response includes available languages in error message
#### Scenario: Task not found
- **GIVEN** non-existent task_id
- **WHEN** user requests translated PDF
- **THEN** system returns 404 Not Found
---
### Requirement: Frontend Translated PDF Download
The frontend SHALL provide UI controls for downloading translated PDFs.
#### Scenario: Show download button when translation complete
- **GIVEN** a task with translation status "completed"
- **WHEN** user views TaskDetailPage
- **THEN** page displays "Download Translated PDF" button
- **AND** button shows target language (e.g., "Download Translated PDF (English)")
#### Scenario: Hide download button when no translation
- **GIVEN** a task without any completed translations
- **WHEN** user views TaskDetailPage
- **THEN** "Download Translated PDF" button is not shown
#### Scenario: Download progress indication
- **GIVEN** user clicks "Download Translated PDF" button
- **WHEN** PDF generation is in progress
- **THEN** button shows loading state
- **AND** prevents double-click
- **WHEN** download completes
- **THEN** browser downloads PDF file
- **AND** button returns to normal state

View File

@@ -0,0 +1,72 @@
## ADDED Requirements
### Requirement: Translated PDF Generation
The system SHALL support generating PDF files with translated content while preserving the original document layout.
#### Scenario: Generate translated PDF from Direct track document
- **GIVEN** a completed translation for a Direct track processed document
- **WHEN** user requests translated PDF via `POST /api/v2/translate/{task_id}/pdf?lang={target_lang}`
- **THEN** the system loads the translation JSON file
- **AND** merges translations with UnifiedDocument by element_id
- **AND** generates PDF with translated text at original positions
- **AND** returns PDF file with Content-Type `application/pdf`
#### Scenario: Generate translated PDF from OCR track document
- **GIVEN** a completed translation for an OCR track processed document
- **WHEN** user requests translated PDF
- **THEN** the system generates PDF preserving all OCR layout information
- **AND** replaces original text with translated content
- **AND** maintains table structure with translated cell content
#### Scenario: Handle missing translations gracefully
- **GIVEN** a translation JSON missing some element_id entries
- **WHEN** generating translated PDF
- **THEN** the system uses original content for missing translations
- **AND** logs warning for each fallback
- **AND** completes PDF generation successfully
#### Scenario: Translated PDF for incomplete translation
- **GIVEN** a task with translation status "pending" or "translating"
- **WHEN** user requests translated PDF
- **THEN** the system returns 400 Bad Request
- **AND** includes error message indicating translation not complete
#### Scenario: Translated PDF for non-existent translation
- **GIVEN** a task that has not been translated to requested language
- **WHEN** user requests translated PDF with `lang=fr`
- **THEN** the system returns 404 Not Found
- **AND** includes error message indicating no translation for language
---
### Requirement: Translation Merge Service
The system SHALL provide a service to merge translation data with UnifiedDocument.
#### Scenario: Merge text element translations
- **GIVEN** a UnifiedDocument with text elements
- **AND** a translation JSON with matching element_ids
- **WHEN** applying translations
- **THEN** the system replaces content field for each matched element
- **AND** preserves all other element properties (bounding_box, style_info, etc.)
#### Scenario: Merge table cell translations
- **GIVEN** a UnifiedDocument containing table elements
- **AND** a translation JSON with table_cell translations like:
```json
{
"table_1_0": {
"cells": [{"row": 0, "col": 0, "content": "Translated"}]
}
}
```
- **WHEN** applying translations
- **THEN** the system updates cell content at matching row/col positions
- **AND** preserves cell structure and styling
#### Scenario: Non-destructive merge operation
- **GIVEN** a UnifiedDocument
- **WHEN** applying translations
- **THEN** the system creates a modified copy
- **AND** original UnifiedDocument remains unchanged