egg/OCR

Files

egg 8d9b69ba93 feat: add document translation via DIFY AI API

Implement document translation feature using DIFY AI API with batch processing:

Backend:
- Add DIFY client with batch translation support (5000 chars, 20 items per batch)
- Add translation service with element extraction and result building
- Add translation router with start/status/result/list/delete endpoints
- Add translation schemas (TranslationRequest, TranslationStatus, etc.)

Frontend:
- Enable translation UI in TaskDetailPage
- Add translation API methods to apiV2.ts
- Add translation types

Features:
- Batch translation with numbered markers [1], [2], [3]...
- Support for text, title, header, footer, paragraph, footnote, table cells
- Translation result JSON with statistics (tokens, latency, batch_count)
- Background task processing with progress tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-02 11:57:02 +08:00

6.6 KiB

Raw Blame History

translation Specification

Purpose

TBD - created by archiving change add-document-translation. Update Purpose after archive.

Requirements

Requirement: Document Translation Service

The system SHALL provide a document translation service that translates extracted text from OCR-processed documents into target languages using DIFY AI API.

Scenario: Successful translation of Direct track document

GIVEN a completed OCR task with Direct track processing
WHEN user requests translation to English
THEN the system extracts all translatable elements (text, title, header, footer, paragraph, footnote, table cells)
AND translates them using DIFY AI API
AND saves the result to {task_id}_translated_en.json

Scenario: Successful translation of OCR track document

GIVEN a completed OCR task with OCR track processing
WHEN user requests translation to Japanese
THEN the system extracts all translatable elements from UnifiedDocument format
AND translates them preserving element_id mapping
AND saves the result to {task_id}_translated_ja.json

Scenario: Successful translation of Hybrid track document

GIVEN a completed OCR task with Hybrid track processing
WHEN translation is requested
THEN the system processes the document using the same unified logic
AND handles any combination of element types present

Scenario: Table cell translation

GIVEN a document containing table elements
WHEN translation is requested
THEN the system extracts text from each table cell
AND translates each cell content individually
AND preserves row/col position in the translation result

Requirement: Translation API Endpoints

The system SHALL expose REST API endpoints for translation operations.

Scenario: Start translation request

GIVEN a completed OCR task with task_id
WHEN POST request to /api/v2/translate/{task_id} with target_lang parameter
THEN the system starts background translation process
AND returns translation job status with 202 Accepted

Scenario: Query translation status

GIVEN an active translation job
WHEN GET request to /api/v2/translate/{task_id}/status
THEN the system returns current status (pending, translating, completed, failed)
AND includes progress information (current_element, total_elements)

Scenario: Retrieve translation result

GIVEN a completed translation job
WHEN GET request to /api/v2/translate/{task_id}/result?lang={target_lang}
THEN the system returns the translation JSON content

Scenario: Translation for non-existent task

GIVEN an invalid or non-existent task_id
WHEN translation is requested
THEN the system returns 404 Not Found error

Requirement: DIFY API Integration

The system SHALL integrate with DIFY AI service for translation.

Scenario: API request format

GIVEN text to be translated
WHEN calling DIFY API
THEN the system sends POST request to /chat-messages endpoint
AND includes query with translation prompt
AND uses blocking response mode
AND includes user identifier for tracking

Scenario: API response handling

GIVEN DIFY API returns translation response
WHEN parsing the response
THEN the system extracts translated text from answer field
AND records usage statistics (tokens, latency)

Scenario: API error handling

GIVEN DIFY API returns error or times out
WHEN handling the error
THEN the system retries up to 3 times with exponential backoff
AND returns appropriate error message if all retries fail

Scenario: API rate limiting

GIVEN high volume of translation requests
WHEN requests approach rate limits
THEN the system queues requests appropriately
AND provides feedback about wait times

Requirement: Translation Prompt Format

The system SHALL use structured prompts for translation requests.

Scenario: Generate translation prompt

GIVEN source text to translate
WHEN preparing DIFY API request

THEN the system formats prompt as:

Translate the following text to {language}.
Return ONLY the translated text, no explanations.

{text}

Scenario: Language name mapping

GIVEN language code like "zh-TW" or "ja"
WHEN constructing translation prompt
THEN the system maps to full language name (Traditional Chinese, Japanese)

Requirement: Translation Progress Reporting

The system SHALL provide real-time progress feedback during translation.

Scenario: Progress during multi-element translation

GIVEN a document with 50 translatable elements
WHEN user queries status
THEN the system returns progress like {"status": "translating", "current_element": 25, "total_elements": 50}

Scenario: Translation starting status

GIVEN translation job just started
WHEN user queries status
THEN the system returns {"status": "pending"}

Requirement: Translation Result Storage

The system SHALL store translation results as independent JSON files.

Scenario: Save translation result

GIVEN translation completes successfully
WHEN saving results
THEN the system creates {original_filename}_translated_{lang}.json
AND includes schema_version, metadata, and translations dict

Scenario: Multiple language translations

GIVEN a document translated to English and Japanese
WHEN checking result files
THEN both xxx_translated_en.json and xxx_translated_ja.json exist
AND original xxx_result.json is unchanged

Requirement: Language Support

The system SHALL support common languages through DIFY AI service.

Scenario: Common language translation

GIVEN target language is English, Chinese, Japanese, or Korean
WHEN translation is requested
THEN the system includes appropriate language name in prompt
AND executes translation successfully

Scenario: Automatic source language detection

GIVEN source_lang is set to "auto"
WHEN translation is executed
THEN the AI model automatically detects source language
AND translates to target language

Scenario: Supported languages list

GIVEN user queries supported languages
WHEN checking language support
THEN the system provides list including:
- English (en)
- Traditional Chinese (zh-TW)
- Simplified Chinese (zh-CN)
- Japanese (ja)
- Korean (ko)
- German (de)
- French (fr)
- Spanish (es)
- Portuguese (pt)
- Italian (it)
- Russian (ru)
- Vietnamese (vi)
- Thai (th)

6.6 KiB Raw Blame History