OCR/openspec/specs/translation/spec.md

# translation Specification

## Purpose
TBD - created by archiving change add-document-translation. Update Purpose after archive.
## Requirements
### Requirement: Document Translation Service

The system SHALL provide a document translation service that translates extracted text from OCR-processed documents into target languages using DIFY AI API.

#### Scenario: Successful translation of Direct track document
- **GIVEN** a completed OCR task with Direct track processing
- **WHEN** user requests translation to English
- **THEN** the system extracts all translatable elements (text, title, header, footer, paragraph, footnote, table cells)
- **AND** translates them using DIFY AI API
- **AND** saves the result to `{task_id}_translated_en.json`

#### Scenario: Successful translation of OCR track document
- **GIVEN** a completed OCR task with OCR track processing
- **WHEN** user requests translation to Japanese
- **THEN** the system extracts all translatable elements from UnifiedDocument format
- **AND** translates them preserving element_id mapping
- **AND** saves the result to `{task_id}_translated_ja.json`

#### Scenario: Successful translation of Hybrid track document
- **GIVEN** a completed OCR task with Hybrid track processing
- **WHEN** translation is requested
- **THEN** the system processes the document using the same unified logic
- **AND** handles any combination of element types present

#### Scenario: Table cell translation
- **GIVEN** a document containing table elements
- **WHEN** translation is requested
- **THEN** the system extracts text from each table cell
- **AND** translates each cell content individually
- **AND** preserves row/col position in the translation result

---

### Requirement: Translation API Endpoints

The system SHALL expose REST API endpoints for translation operations.

#### Scenario: Start translation request
- **GIVEN** a completed OCR task with task_id
- **WHEN** POST request to `/api/v2/translate/{task_id}` with target_lang parameter
- **THEN** the system starts background translation process
- **AND** returns translation job status with 202 Accepted

#### Scenario: Query translation status
- **GIVEN** an active translation job
- **WHEN** GET request to `/api/v2/translate/{task_id}/status`
- **THEN** the system returns current status (pending, translating, completed, failed)
- **AND** includes progress information (current_element, total_elements)

#### Scenario: Retrieve translation result
- **GIVEN** a completed translation job
- **WHEN** GET request to `/api/v2/translate/{task_id}/result?lang={target_lang}`
- **THEN** the system returns the translation JSON content

#### Scenario: Translation for non-existent task
- **GIVEN** an invalid or non-existent task_id
- **WHEN** translation is requested
- **THEN** the system returns 404 Not Found error

---

### Requirement: DIFY API Integration

The system SHALL integrate with DIFY AI service for translation.

#### Scenario: API request format
- **GIVEN** text to be translated
- **WHEN** calling DIFY API
- **THEN** the system sends POST request to `/chat-messages` endpoint
- **AND** includes query with translation prompt
- **AND** uses blocking response mode
- **AND** includes user identifier for tracking

#### Scenario: API response handling
- **GIVEN** DIFY API returns translation response
- **WHEN** parsing the response
- **THEN** the system extracts translated text from `answer` field
- **AND** records usage statistics (tokens, latency)

#### Scenario: API error handling
- **GIVEN** DIFY API returns error or times out
- **WHEN** handling the error
- **THEN** the system retries up to 3 times with exponential backoff
- **AND** returns appropriate error message if all retries fail

#### Scenario: API rate limiting
- **GIVEN** high volume of translation requests
- **WHEN** requests approach rate limits
- **THEN** the system queues requests appropriately
- **AND** provides feedback about wait times

---

### Requirement: Translation Prompt Format

The system SHALL use structured prompts for translation requests.

#### Scenario: Generate translation prompt
- **GIVEN** source text to translate
- **WHEN** preparing DIFY API request
- **THEN** the system formats prompt as:
  ```
  Translate the following text to {language}.
  Return ONLY the translated text, no explanations.

  {text}
  ```

#### Scenario: Language name mapping
- **GIVEN** language code like "zh-TW" or "ja"
- **WHEN** constructing translation prompt
- **THEN** the system maps to full language name (Traditional Chinese, Japanese)

---

### Requirement: Translation Progress Reporting

The system SHALL provide real-time progress feedback during translation.

#### Scenario: Progress during multi-element translation
- **GIVEN** a document with 50 translatable elements
- **WHEN** user queries status
- **THEN** the system returns progress like `{"status": "translating", "current_element": 25, "total_elements": 50}`

#### Scenario: Translation starting status
- **GIVEN** translation job just started
- **WHEN** user queries status
- **THEN** the system returns `{"status": "pending"}`

---

### Requirement: Translation Result Storage

The system SHALL store translation results as independent JSON files.

#### Scenario: Save translation result
- **GIVEN** translation completes successfully
- **WHEN** saving results
- **THEN** the system creates `{original_filename}_translated_{lang}.json`
- **AND** includes schema_version, metadata, and translations dict

#### Scenario: Multiple language translations
- **GIVEN** a document translated to English and Japanese
- **WHEN** checking result files
- **THEN** both `xxx_translated_en.json` and `xxx_translated_ja.json` exist
- **AND** original `xxx_result.json` is unchanged

---

### Requirement: Language Support

The system SHALL support common languages through DIFY AI service.

#### Scenario: Common language translation
- **GIVEN** target language is English, Chinese, Japanese, or Korean
- **WHEN** translation is requested
- **THEN** the system includes appropriate language name in prompt
- **AND** executes translation successfully

#### Scenario: Automatic source language detection
- **GIVEN** source_lang is set to "auto"
- **WHEN** translation is executed
- **THEN** the AI model automatically detects source language
- **AND** translates to target language

#### Scenario: Supported languages list
- **GIVEN** user queries supported languages
- **WHEN** checking language support
- **THEN** the system provides list including:
  - English (en)
  - Traditional Chinese (zh-TW)
  - Simplified Chinese (zh-CN)
  - Japanese (ja)
  - Korean (ko)
  - German (de)
  - French (fr)
  - Spanish (es)
  - Portuguese (pt)
  - Italian (it)
  - Russian (ru)
  - Vietnamese (vi)
  - Thai (th)

### Requirement: Translated PDF Generation

The system SHALL support generating PDF files with translated content while preserving the original document layout.

#### Scenario: Generate translated PDF from Direct track document
- **GIVEN** a completed translation for a Direct track processed document
- **WHEN** user requests translated PDF via `POST /api/v2/translate/{task_id}/pdf?lang={target_lang}`
- **THEN** the system loads the translation JSON file
- **AND** merges translations with UnifiedDocument by element_id
- **AND** generates PDF with translated text at original positions
- **AND** returns PDF file with Content-Type `application/pdf`

#### Scenario: Generate translated PDF from OCR track document
- **GIVEN** a completed translation for an OCR track processed document
- **WHEN** user requests translated PDF
- **THEN** the system generates PDF preserving all OCR layout information
- **AND** replaces original text with translated content
- **AND** maintains table structure with translated cell content

#### Scenario: Handle missing translations gracefully
- **GIVEN** a translation JSON missing some element_id entries
- **WHEN** generating translated PDF
- **THEN** the system uses original content for missing translations
- **AND** logs warning for each fallback
- **AND** completes PDF generation successfully

#### Scenario: Translated PDF for incomplete translation
- **GIVEN** a task with translation status "pending" or "translating"
- **WHEN** user requests translated PDF
- **THEN** the system returns 400 Bad Request
- **AND** includes error message indicating translation not complete

#### Scenario: Translated PDF for non-existent translation
- **GIVEN** a task that has not been translated to requested language
- **WHEN** user requests translated PDF with `lang=fr`
- **THEN** the system returns 404 Not Found
- **AND** includes error message indicating no translation for language

---

### Requirement: Translation Merge Service

The system SHALL provide a service to merge translation data with UnifiedDocument.

#### Scenario: Merge text element translations
- **GIVEN** a UnifiedDocument with text elements
- **AND** a translation JSON with matching element_ids
- **WHEN** applying translations
- **THEN** the system replaces content field for each matched element
- **AND** preserves all other element properties (bounding_box, style_info, etc.)

#### Scenario: Merge table cell translations
- **GIVEN** a UnifiedDocument containing table elements
- **AND** a translation JSON with table_cell translations like:
  ```json
  {
    "table_1_0": {
      "cells": [{"row": 0, "col": 0, "content": "Translated"}]
    }
  }
  ```
- **WHEN** applying translations
- **THEN** the system updates cell content at matching row/col positions
- **AND** preserves cell structure and styling

#### Scenario: Non-destructive merge operation
- **GIVEN** a UnifiedDocument
- **WHEN** applying translations
- **THEN** the system creates a modified copy
- **AND** original UnifiedDocument remains unchanged

### Requirement: Translation Output as Reflow PDF

The system SHALL generate translated documents as reflow-layout PDFs with real visible text, separate from the Layout PDF which uses background images.

#### Scenario: Generate translated PDF with reflow layout
- **WHEN** translation is completed for a document
- **THEN** the system SHALL generate a new PDF with translated text
- **AND** the translated PDF SHALL use reflow layout (not background image)
- **AND** text SHALL be real visible text, not invisible overlay
- **AND** page breaks SHALL correspond to original document pages

#### Scenario: Maintain page correspondence in translated output
- **WHEN** generating translated PDF
- **THEN** content from original page 1 SHALL appear in translated page 1
- **AND** content from original page 2 SHALL appear in translated page 2
- **AND** each page may have different content length but maintains page boundaries

#### Scenario: Chart text excluded from translation
- **WHEN** extracting text for translation from Direct Track documents
- **THEN** text elements within chart regions SHALL NOT be included
- **AND** chart labels, axis text, and legends SHALL remain untranslated
- **AND** this is expected behavior documented for users

### Requirement: Dual PDF Output Concept

The system SHALL maintain clear separation between Layout PDF (preview) and Translated PDF (output).

#### Scenario: Layout PDF for preview
- **WHEN** user views a processed document before translation
- **THEN** the Layout PDF SHALL be displayed
- **AND** Layout PDF preserves exact visual appearance of source
- **AND** text is invisible overlay for extraction purposes only

#### Scenario: Translated PDF for final output
- **WHEN** user requests translated document
- **THEN** the Translated PDF SHALL be generated
- **AND** Translated PDF uses reflow layout with visible translated text
- **AND** original visual styling is not preserved (text-focused output)

#### Scenario: Both PDFs available after translation
- **WHEN** translation is completed
- **THEN** both Layout PDF and Translated PDF SHALL be available for download
- **AND** user can choose which version to download
- **AND** Layout PDF remains unchanged after translation