Files
OCR/openspec/specs/translation/spec.md
egg 8d9b69ba93 feat: add document translation via DIFY AI API
Implement document translation feature using DIFY AI API with batch processing:

Backend:
- Add DIFY client with batch translation support (5000 chars, 20 items per batch)
- Add translation service with element extraction and result building
- Add translation router with start/status/result/list/delete endpoints
- Add translation schemas (TranslationRequest, TranslationStatus, etc.)

Frontend:
- Enable translation UI in TaskDetailPage
- Add translation API methods to apiV2.ts
- Add translation types

Features:
- Batch translation with numbered markers [1], [2], [3]...
- Support for text, title, header, footer, paragraph, footnote, table cells
- Translation result JSON with statistics (tokens, latency, batch_count)
- Background task processing with progress tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 11:57:02 +08:00

6.6 KiB

translation Specification

Purpose

TBD - created by archiving change add-document-translation. Update Purpose after archive.

Requirements

Requirement: Document Translation Service

The system SHALL provide a document translation service that translates extracted text from OCR-processed documents into target languages using DIFY AI API.

Scenario: Successful translation of Direct track document

  • GIVEN a completed OCR task with Direct track processing
  • WHEN user requests translation to English
  • THEN the system extracts all translatable elements (text, title, header, footer, paragraph, footnote, table cells)
  • AND translates them using DIFY AI API
  • AND saves the result to {task_id}_translated_en.json

Scenario: Successful translation of OCR track document

  • GIVEN a completed OCR task with OCR track processing
  • WHEN user requests translation to Japanese
  • THEN the system extracts all translatable elements from UnifiedDocument format
  • AND translates them preserving element_id mapping
  • AND saves the result to {task_id}_translated_ja.json

Scenario: Successful translation of Hybrid track document

  • GIVEN a completed OCR task with Hybrid track processing
  • WHEN translation is requested
  • THEN the system processes the document using the same unified logic
  • AND handles any combination of element types present

Scenario: Table cell translation

  • GIVEN a document containing table elements
  • WHEN translation is requested
  • THEN the system extracts text from each table cell
  • AND translates each cell content individually
  • AND preserves row/col position in the translation result

Requirement: Translation API Endpoints

The system SHALL expose REST API endpoints for translation operations.

Scenario: Start translation request

  • GIVEN a completed OCR task with task_id
  • WHEN POST request to /api/v2/translate/{task_id} with target_lang parameter
  • THEN the system starts background translation process
  • AND returns translation job status with 202 Accepted

Scenario: Query translation status

  • GIVEN an active translation job
  • WHEN GET request to /api/v2/translate/{task_id}/status
  • THEN the system returns current status (pending, translating, completed, failed)
  • AND includes progress information (current_element, total_elements)

Scenario: Retrieve translation result

  • GIVEN a completed translation job
  • WHEN GET request to /api/v2/translate/{task_id}/result?lang={target_lang}
  • THEN the system returns the translation JSON content

Scenario: Translation for non-existent task

  • GIVEN an invalid or non-existent task_id
  • WHEN translation is requested
  • THEN the system returns 404 Not Found error

Requirement: DIFY API Integration

The system SHALL integrate with DIFY AI service for translation.

Scenario: API request format

  • GIVEN text to be translated
  • WHEN calling DIFY API
  • THEN the system sends POST request to /chat-messages endpoint
  • AND includes query with translation prompt
  • AND uses blocking response mode
  • AND includes user identifier for tracking

Scenario: API response handling

  • GIVEN DIFY API returns translation response
  • WHEN parsing the response
  • THEN the system extracts translated text from answer field
  • AND records usage statistics (tokens, latency)

Scenario: API error handling

  • GIVEN DIFY API returns error or times out
  • WHEN handling the error
  • THEN the system retries up to 3 times with exponential backoff
  • AND returns appropriate error message if all retries fail

Scenario: API rate limiting

  • GIVEN high volume of translation requests
  • WHEN requests approach rate limits
  • THEN the system queues requests appropriately
  • AND provides feedback about wait times

Requirement: Translation Prompt Format

The system SHALL use structured prompts for translation requests.

Scenario: Generate translation prompt

  • GIVEN source text to translate
  • WHEN preparing DIFY API request
  • THEN the system formats prompt as:
    Translate the following text to {language}.
    Return ONLY the translated text, no explanations.
    
    {text}
    

Scenario: Language name mapping

  • GIVEN language code like "zh-TW" or "ja"
  • WHEN constructing translation prompt
  • THEN the system maps to full language name (Traditional Chinese, Japanese)

Requirement: Translation Progress Reporting

The system SHALL provide real-time progress feedback during translation.

Scenario: Progress during multi-element translation

  • GIVEN a document with 50 translatable elements
  • WHEN user queries status
  • THEN the system returns progress like {"status": "translating", "current_element": 25, "total_elements": 50}

Scenario: Translation starting status

  • GIVEN translation job just started
  • WHEN user queries status
  • THEN the system returns {"status": "pending"}

Requirement: Translation Result Storage

The system SHALL store translation results as independent JSON files.

Scenario: Save translation result

  • GIVEN translation completes successfully
  • WHEN saving results
  • THEN the system creates {original_filename}_translated_{lang}.json
  • AND includes schema_version, metadata, and translations dict

Scenario: Multiple language translations

  • GIVEN a document translated to English and Japanese
  • WHEN checking result files
  • THEN both xxx_translated_en.json and xxx_translated_ja.json exist
  • AND original xxx_result.json is unchanged

Requirement: Language Support

The system SHALL support common languages through DIFY AI service.

Scenario: Common language translation

  • GIVEN target language is English, Chinese, Japanese, or Korean
  • WHEN translation is requested
  • THEN the system includes appropriate language name in prompt
  • AND executes translation successfully

Scenario: Automatic source language detection

  • GIVEN source_lang is set to "auto"
  • WHEN translation is executed
  • THEN the AI model automatically detects source language
  • AND translates to target language

Scenario: Supported languages list

  • GIVEN user queries supported languages
  • WHEN checking language support
  • THEN the system provides list including:
    • English (en)
    • Traditional Chinese (zh-TW)
    • Simplified Chinese (zh-CN)
    • Japanese (ja)
    • Korean (ko)
    • German (de)
    • French (fr)
    • Spanish (es)
    • Portuguese (pt)
    • Italian (it)
    • Russian (ru)
    • Vietnamese (vi)
    • Thai (th)