Backend changes: - Apply background image + invisible text layer to all Direct Track PDFs - Add CHART to regions_to_avoid for text extraction - Improve visual fidelity for native PDFs and Office documents Frontend changes: - Remove JSON, UnifiedDocument, Markdown download buttons - Simplify to 2-column layout with only Layout PDF and Reflow PDF - Remove translation JSON download and Layout PDF option - Keep only Reflow PDF for translated document downloads - Clean up unused imports (FileJson, Database, FileOutput) Archives two OpenSpec proposals: - unify-direct-track-pdf-rendering - simplify-frontend-export-options 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
12 KiB
translation Specification
Purpose
TBD - created by archiving change add-document-translation. Update Purpose after archive.
Requirements
Requirement: Document Translation Service
The system SHALL provide a document translation service that translates extracted text from OCR-processed documents into target languages using DIFY AI API.
Scenario: Successful translation of Direct track document
- GIVEN a completed OCR task with Direct track processing
- WHEN user requests translation to English
- THEN the system extracts all translatable elements (text, title, header, footer, paragraph, footnote, table cells)
- AND translates them using DIFY AI API
- AND saves the result to
{task_id}_translated_en.json
Scenario: Successful translation of OCR track document
- GIVEN a completed OCR task with OCR track processing
- WHEN user requests translation to Japanese
- THEN the system extracts all translatable elements from UnifiedDocument format
- AND translates them preserving element_id mapping
- AND saves the result to
{task_id}_translated_ja.json
Scenario: Successful translation of Hybrid track document
- GIVEN a completed OCR task with Hybrid track processing
- WHEN translation is requested
- THEN the system processes the document using the same unified logic
- AND handles any combination of element types present
Scenario: Table cell translation
- GIVEN a document containing table elements
- WHEN translation is requested
- THEN the system extracts text from each table cell
- AND translates each cell content individually
- AND preserves row/col position in the translation result
Requirement: Translation API Endpoints
The system SHALL expose REST API endpoints for translation operations.
Scenario: Start translation request
- GIVEN a completed OCR task with task_id
- WHEN POST request to
/api/v2/translate/{task_id}with target_lang parameter - THEN the system starts background translation process
- AND returns translation job status with 202 Accepted
Scenario: Query translation status
- GIVEN an active translation job
- WHEN GET request to
/api/v2/translate/{task_id}/status - THEN the system returns current status (pending, translating, completed, failed)
- AND includes progress information (current_element, total_elements)
Scenario: Retrieve translation result
- GIVEN a completed translation job
- WHEN GET request to
/api/v2/translate/{task_id}/result?lang={target_lang} - THEN the system returns the translation JSON content
Scenario: Translation for non-existent task
- GIVEN an invalid or non-existent task_id
- WHEN translation is requested
- THEN the system returns 404 Not Found error
Requirement: DIFY API Integration
The system SHALL integrate with DIFY AI service for translation.
Scenario: API request format
- GIVEN text to be translated
- WHEN calling DIFY API
- THEN the system sends POST request to
/chat-messagesendpoint - AND includes query with translation prompt
- AND uses blocking response mode
- AND includes user identifier for tracking
Scenario: API response handling
- GIVEN DIFY API returns translation response
- WHEN parsing the response
- THEN the system extracts translated text from
answerfield - AND records usage statistics (tokens, latency)
Scenario: API error handling
- GIVEN DIFY API returns error or times out
- WHEN handling the error
- THEN the system retries up to 3 times with exponential backoff
- AND returns appropriate error message if all retries fail
Scenario: API rate limiting
- GIVEN high volume of translation requests
- WHEN requests approach rate limits
- THEN the system queues requests appropriately
- AND provides feedback about wait times
Requirement: Translation Prompt Format
The system SHALL use structured prompts for translation requests.
Scenario: Generate translation prompt
- GIVEN source text to translate
- WHEN preparing DIFY API request
- THEN the system formats prompt as:
Translate the following text to {language}. Return ONLY the translated text, no explanations. {text}
Scenario: Language name mapping
- GIVEN language code like "zh-TW" or "ja"
- WHEN constructing translation prompt
- THEN the system maps to full language name (Traditional Chinese, Japanese)
Requirement: Translation Progress Reporting
The system SHALL provide real-time progress feedback during translation.
Scenario: Progress during multi-element translation
- GIVEN a document with 50 translatable elements
- WHEN user queries status
- THEN the system returns progress like
{"status": "translating", "current_element": 25, "total_elements": 50}
Scenario: Translation starting status
- GIVEN translation job just started
- WHEN user queries status
- THEN the system returns
{"status": "pending"}
Requirement: Translation Result Storage
The system SHALL store translation results as independent JSON files.
Scenario: Save translation result
- GIVEN translation completes successfully
- WHEN saving results
- THEN the system creates
{original_filename}_translated_{lang}.json - AND includes schema_version, metadata, and translations dict
Scenario: Multiple language translations
- GIVEN a document translated to English and Japanese
- WHEN checking result files
- THEN both
xxx_translated_en.jsonandxxx_translated_ja.jsonexist - AND original
xxx_result.jsonis unchanged
Requirement: Language Support
The system SHALL support common languages through DIFY AI service.
Scenario: Common language translation
- GIVEN target language is English, Chinese, Japanese, or Korean
- WHEN translation is requested
- THEN the system includes appropriate language name in prompt
- AND executes translation successfully
Scenario: Automatic source language detection
- GIVEN source_lang is set to "auto"
- WHEN translation is executed
- THEN the AI model automatically detects source language
- AND translates to target language
Scenario: Supported languages list
- GIVEN user queries supported languages
- WHEN checking language support
- THEN the system provides list including:
- English (en)
- Traditional Chinese (zh-TW)
- Simplified Chinese (zh-CN)
- Japanese (ja)
- Korean (ko)
- German (de)
- French (fr)
- Spanish (es)
- Portuguese (pt)
- Italian (it)
- Russian (ru)
- Vietnamese (vi)
- Thai (th)
Requirement: Translated PDF Generation
The system SHALL support generating PDF files with translated content while preserving the original document layout.
Scenario: Generate translated PDF from Direct track document
- GIVEN a completed translation for a Direct track processed document
- WHEN user requests translated PDF via
POST /api/v2/translate/{task_id}/pdf?lang={target_lang} - THEN the system loads the translation JSON file
- AND merges translations with UnifiedDocument by element_id
- AND generates PDF with translated text at original positions
- AND returns PDF file with Content-Type
application/pdf
Scenario: Generate translated PDF from OCR track document
- GIVEN a completed translation for an OCR track processed document
- WHEN user requests translated PDF
- THEN the system generates PDF preserving all OCR layout information
- AND replaces original text with translated content
- AND maintains table structure with translated cell content
Scenario: Handle missing translations gracefully
- GIVEN a translation JSON missing some element_id entries
- WHEN generating translated PDF
- THEN the system uses original content for missing translations
- AND logs warning for each fallback
- AND completes PDF generation successfully
Scenario: Translated PDF for incomplete translation
- GIVEN a task with translation status "pending" or "translating"
- WHEN user requests translated PDF
- THEN the system returns 400 Bad Request
- AND includes error message indicating translation not complete
Scenario: Translated PDF for non-existent translation
- GIVEN a task that has not been translated to requested language
- WHEN user requests translated PDF with
lang=fr - THEN the system returns 404 Not Found
- AND includes error message indicating no translation for language
Requirement: Translation Merge Service
The system SHALL provide a service to merge translation data with UnifiedDocument.
Scenario: Merge text element translations
- GIVEN a UnifiedDocument with text elements
- AND a translation JSON with matching element_ids
- WHEN applying translations
- THEN the system replaces content field for each matched element
- AND preserves all other element properties (bounding_box, style_info, etc.)
Scenario: Merge table cell translations
- GIVEN a UnifiedDocument containing table elements
- AND a translation JSON with table_cell translations like:
{ "table_1_0": { "cells": [{"row": 0, "col": 0, "content": "Translated"}] } } - WHEN applying translations
- THEN the system updates cell content at matching row/col positions
- AND preserves cell structure and styling
Scenario: Non-destructive merge operation
- GIVEN a UnifiedDocument
- WHEN applying translations
- THEN the system creates a modified copy
- AND original UnifiedDocument remains unchanged
Requirement: Translation Output as Reflow PDF
The system SHALL generate translated documents as reflow-layout PDFs with real visible text, separate from the Layout PDF which uses background images.
Scenario: Generate translated PDF with reflow layout
- WHEN translation is completed for a document
- THEN the system SHALL generate a new PDF with translated text
- AND the translated PDF SHALL use reflow layout (not background image)
- AND text SHALL be real visible text, not invisible overlay
- AND page breaks SHALL correspond to original document pages
Scenario: Maintain page correspondence in translated output
- WHEN generating translated PDF
- THEN content from original page 1 SHALL appear in translated page 1
- AND content from original page 2 SHALL appear in translated page 2
- AND each page may have different content length but maintains page boundaries
Scenario: Chart text excluded from translation
- WHEN extracting text for translation from Direct Track documents
- THEN text elements within chart regions SHALL NOT be included
- AND chart labels, axis text, and legends SHALL remain untranslated
- AND this is expected behavior documented for users
Requirement: Dual PDF Output Concept
The system SHALL maintain clear separation between Layout PDF (preview) and Translated PDF (output).
Scenario: Layout PDF for preview
- WHEN user views a processed document before translation
- THEN the Layout PDF SHALL be displayed
- AND Layout PDF preserves exact visual appearance of source
- AND text is invisible overlay for extraction purposes only
Scenario: Translated PDF for final output
- WHEN user requests translated document
- THEN the Translated PDF SHALL be generated
- AND Translated PDF uses reflow layout with visible translated text
- AND original visual styling is not preserved (text-focused output)
Scenario: Both PDFs available after translation
- WHEN translation is completed
- THEN both Layout PDF and Translated PDF SHALL be available for download
- AND user can choose which version to download
- AND Layout PDF remains unchanged after translation