egg/OCR

Files

egg 8d9b69ba93 feat: add document translation via DIFY AI API

Implement document translation feature using DIFY AI API with batch processing:

Backend:
- Add DIFY client with batch translation support (5000 chars, 20 items per batch)
- Add translation service with element extraction and result building
- Add translation router with start/status/result/list/delete endpoints
- Add translation schemas (TranslationRequest, TranslationStatus, etc.)

Frontend:
- Enable translation UI in TaskDetailPage
- Add translation API methods to apiV2.ts
- Add translation types

Features:
- Batch translation with numbered markers [1], [2], [3]...
- Support for text, title, header, footer, paragraph, footnote, table cells
- Translation result JSON with statistics (tokens, latency, batch_count)
- Background task processing with progress tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-02 11:57:02 +08:00

9.3 KiB

Raw Blame History

Design: Document Translation Feature

Context

Tool_OCR processes documents through three tracks (Direct/OCR/Hybrid) and outputs UnifiedDocument JSON. Users need translation capability to convert extracted text into different languages while preserving document structure.

Constraints

Must use DIFY AI service for translation
API-based solution (no local model management)
Translation quality depends on DIFY's underlying model

Stakeholders

End users: Need translated documents
System: Simple HTTP-based integration

Goals / Non-Goals

Goals

Translate documents using DIFY AI API
Preserve document structure (element positions, formatting)
Support all three processing tracks with unified logic
Real-time progress feedback to users
Simple, maintainable API integration

Non-Goals

Local model inference (replaced by DIFY API)
GPU memory management (not needed)
Translation memory or glossary support
Concurrent translation processing

Decisions

Decision 1: Translation Provider

Choice: DIFY AI Service (theaken.com)

Configuration:

Base URL: https://dify.theaken.com/v1
Endpoint: POST /chat-messages
API Key: app-YOPrF2ro5fshzMkCZviIuUJd
Mode: Chat (Blocking response)

Rationale:

High-quality cloud AI translation
No local model management required
No GPU memory concerns
Easy to maintain and update

Decision 2: Response Mode

Choice: Blocking Mode

API Request Format:

{
  "inputs": {},
  "query": "Translate the following text to Chinese:\n\nHello world",
  "response_mode": "blocking",
  "conversation_id": "",
  "user": "tool-ocr-{task_id}"
}

API Response Format:

{
  "event": "message",
  "answer": "你好世界",
  "conversation_id": "xxx",
  "metadata": {
    "usage": {
      "total_tokens": 54,
      "latency": 1.26
    }
  }
}

Rationale:

Simpler implementation than streaming
Adequate for batch text translation
Complete response in single call

Decision 3: Translation Batch Format

Choice: Single text per request with translation prompt

Request Format:

Translate the following text to {target_language}.
Return ONLY the translated text, no explanations.

{text_content}

Rationale:

Clear instruction for AI
Predictable response format
Easy to parse result

Decision 4: Translation Result Storage

Choice: Independent JSON file per language (unchanged from previous design)

backend/storage/results/{task_id}/
├── xxx_result.json              # Original
├── xxx_translated_en.json       # English translation
├── xxx_translated_ja.json       # Japanese translation
└── ...

Rationale:

Non-destructive (original preserved)
Multiple languages supported
Easy to manage and delete
Clear file naming convention

Decision 5: Element Type Handling

Translatable types (content is string):

text, title, header, footer, paragraph, footnote

Special handling (content is dict):

table -> Translate cells[].content

Skip (non-text content):

page_number, image, chart, logo, reference

Architecture

Component Diagram

┌─────────────────────────────────────────────────────────────┐
│                        Frontend                              │
│  ┌─────────────┐  ┌──────────────┐  ┌─────────────────────┐ │
│  │ TaskDetail  │  │ TranslateBtn │  │ ProgressDisplay     │ │
│  └─────────────┘  └──────────────┘  └─────────────────────┘ │
└────────────────────────────┬────────────────────────────────┘
                             │ HTTP
┌────────────────────────────▼────────────────────────────────┐
│                     Backend API                              │
│  ┌─────────────────────────────────────────────────────────┐│
│  │ TranslateRouter                                          ││
│  │ POST /api/v2/translate/{task_id}                         ││
│  │ GET  /api/v2/translate/{task_id}/status                  ││
│  │ GET  /api/v2/translate/{task_id}/result                  ││
│  └─────────────────────────────────────────────────────────┘│
└────────────────────────────┬────────────────────────────────┘
                             │
┌────────────────────────────▼────────────────────────────────┐
│                  TranslationService                          │
│  ┌───────────────┐  ┌───────────────┐  ┌─────────────────┐  │
│  │ DifyClient    │  │ BatchBuilder  │  │ ResultParser    │  │
│  │ - translate() │  │ - extract()   │  │ - parse()       │  │
│  │ - chat()      │  │ - format()    │  │ - map_ids()     │  │
│  └───────────────┘  └───────────────┘  └─────────────────┘  │
└────────────────────────────┬────────────────────────────────┘
                             │ HTTPS
┌────────────────────────────▼────────────────────────────────┐
│                    DIFY AI Service                           │
│                 https://dify.theaken.com/v1                  │
│                    (Chat - Blocking)                         │
└─────────────────────────────────────────────────────────────┘

Translation JSON Schema

{
  "schema_version": "1.0.0",
  "source_document": "xxx_result.json",
  "source_lang": "auto",
  "target_lang": "en",
  "provider": "dify",
  "translated_at": "2025-12-02T12:00:00Z",
  "statistics": {
    "total_elements": 50,
    "translated_elements": 45,
    "skipped_elements": 5,
    "total_characters": 5000,
    "processing_time_seconds": 30.5,
    "total_tokens": 2500
  },
  "translations": {
    "pp3_0_0": "Company Profile",
    "pp3_0_1": "Founded in 2020...",
    "table_1_0": {
      "cells": [
        {"row": 0, "col": 0, "content": "Technology"},
        {"row": 0, "col": 1, "content": "Epoxy"}
      ]
    }
  }
}

Language Code Mapping

LANGUAGE_NAMES = {
    "en": "English",
    "zh-TW": "Traditional Chinese",
    "zh-CN": "Simplified Chinese",
    "ja": "Japanese",
    "ko": "Korean",
    "de": "German",
    "fr": "French",
    "es": "Spanish",
    "pt": "Portuguese",
    "it": "Italian",
    "ru": "Russian",
    "vi": "Vietnamese",
    "th": "Thai",
    # Additional languages as needed
}

Risks / Trade-offs

Risk 1: API Availability

Risk: DIFY service downtime affects translation
Mitigation: Add timeout handling, retry logic, graceful error messages

Risk 2: API Cost

Risk: High volume translation increases cost
Mitigation: Monitor usage via metadata, consider rate limiting

Risk 3: Network Latency

Risk: Each translation request adds network latency
Mitigation: Batch text when possible, show progress to user

Risk 4: Translation Quality Variance

Risk: AI translation quality varies by language pair
Mitigation: Document known limitations, allow user feedback

Migration Plan

Phase 1: Core Translation (This Proposal)

DIFY client implementation
Backend translation service (rewrite)
API endpoints (modify)
Frontend activation

Phase 2: Enhanced Features (Future)

Translated PDF generation
Translation caching
Custom terminology support

Rollback

Translation is additive feature
No schema changes to existing data
Can disable by removing router registration

Open Questions

Rate Limiting: Should we limit requests per minute to DIFY API?
- Tentative: 10 requests per minute per user
Retry Logic: How to handle API failures?
- Tentative: Retry up to 3 times with exponential backoff
Batch Size: How many elements per API call?
- Tentative: 1 element per call for simplicity, optimize later if needed

9.3 KiB Raw Blame History

Design: Document Translation Feature

Context

Constraints

Stakeholders

Goals / Non-Goals

Goals

Non-Goals

Decisions

Decision 1: Translation Provider

Decision 2: Response Mode

Decision 3: Translation Batch Format

Decision 4: Translation Result Storage

Decision 5: Element Type Handling

Architecture

Component Diagram

Translation JSON Schema

Language Code Mapping

Risks / Trade-offs

Risk 1: API Availability

Risk 2: API Cost

Risk 3: Network Latency

Risk 4: Translation Quality Variance

Migration Plan

Phase 1: Core Translation (This Proposal)

Phase 2: Enhanced Features (Future)

Rollback

Open Questions

9.3 KiB

Raw Blame History