egg/OCR

Files

egg 24253ac15e feat: unify Direct Track PDF rendering and simplify export options

Backend changes:
- Apply background image + invisible text layer to all Direct Track PDFs
- Add CHART to regions_to_avoid for text extraction
- Improve visual fidelity for native PDFs and Office documents

Frontend changes:
- Remove JSON, UnifiedDocument, Markdown download buttons
- Simplify to 2-column layout with only Layout PDF and Reflow PDF
- Remove translation JSON download and Layout PDF option
- Keep only Reflow PDF for translated document downloads
- Clean up unused imports (FileJson, Database, FileOutput)

Archives two OpenSpec proposals:
- unify-direct-track-pdf-rendering
- simplify-frontend-export-options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-12 07:50:43 +08:00

12 KiB

Raw Blame History

translation Specification

Purpose

TBD - created by archiving change add-document-translation. Update Purpose after archive.

Requirements

Requirement: Document Translation Service

The system SHALL provide a document translation service that translates extracted text from OCR-processed documents into target languages using DIFY AI API.

Scenario: Successful translation of Direct track document

GIVEN a completed OCR task with Direct track processing
WHEN user requests translation to English
THEN the system extracts all translatable elements (text, title, header, footer, paragraph, footnote, table cells)
AND translates them using DIFY AI API
AND saves the result to {task_id}_translated_en.json

Scenario: Successful translation of OCR track document

GIVEN a completed OCR task with OCR track processing
WHEN user requests translation to Japanese
THEN the system extracts all translatable elements from UnifiedDocument format
AND translates them preserving element_id mapping
AND saves the result to {task_id}_translated_ja.json

Scenario: Successful translation of Hybrid track document

GIVEN a completed OCR task with Hybrid track processing
WHEN translation is requested
THEN the system processes the document using the same unified logic
AND handles any combination of element types present

Scenario: Table cell translation

GIVEN a document containing table elements
WHEN translation is requested
THEN the system extracts text from each table cell
AND translates each cell content individually
AND preserves row/col position in the translation result

Requirement: Translation API Endpoints

The system SHALL expose REST API endpoints for translation operations.

Scenario: Start translation request

GIVEN a completed OCR task with task_id
WHEN POST request to /api/v2/translate/{task_id} with target_lang parameter
THEN the system starts background translation process
AND returns translation job status with 202 Accepted

Scenario: Query translation status

GIVEN an active translation job
WHEN GET request to /api/v2/translate/{task_id}/status
THEN the system returns current status (pending, translating, completed, failed)
AND includes progress information (current_element, total_elements)

Scenario: Retrieve translation result

GIVEN a completed translation job
WHEN GET request to /api/v2/translate/{task_id}/result?lang={target_lang}
THEN the system returns the translation JSON content

Scenario: Translation for non-existent task

GIVEN an invalid or non-existent task_id
WHEN translation is requested
THEN the system returns 404 Not Found error

Requirement: DIFY API Integration

The system SHALL integrate with DIFY AI service for translation.

Scenario: API request format

GIVEN text to be translated
WHEN calling DIFY API
THEN the system sends POST request to /chat-messages endpoint
AND includes query with translation prompt
AND uses blocking response mode
AND includes user identifier for tracking

Scenario: API response handling

GIVEN DIFY API returns translation response
WHEN parsing the response
THEN the system extracts translated text from answer field
AND records usage statistics (tokens, latency)

Scenario: API error handling

GIVEN DIFY API returns error or times out
WHEN handling the error
THEN the system retries up to 3 times with exponential backoff
AND returns appropriate error message if all retries fail

Scenario: API rate limiting

GIVEN high volume of translation requests
WHEN requests approach rate limits
THEN the system queues requests appropriately
AND provides feedback about wait times

Requirement: Translation Prompt Format

The system SHALL use structured prompts for translation requests.

Scenario: Generate translation prompt

GIVEN source text to translate
WHEN preparing DIFY API request

THEN the system formats prompt as:

Translate the following text to {language}.
Return ONLY the translated text, no explanations.

{text}

Scenario: Language name mapping

GIVEN language code like "zh-TW" or "ja"
WHEN constructing translation prompt
THEN the system maps to full language name (Traditional Chinese, Japanese)

Requirement: Translation Progress Reporting

The system SHALL provide real-time progress feedback during translation.

Scenario: Progress during multi-element translation

GIVEN a document with 50 translatable elements
WHEN user queries status
THEN the system returns progress like {"status": "translating", "current_element": 25, "total_elements": 50}

Scenario: Translation starting status

GIVEN translation job just started
WHEN user queries status
THEN the system returns {"status": "pending"}

Requirement: Translation Result Storage

The system SHALL store translation results as independent JSON files.

Scenario: Save translation result

GIVEN translation completes successfully
WHEN saving results
THEN the system creates {original_filename}_translated_{lang}.json
AND includes schema_version, metadata, and translations dict

Scenario: Multiple language translations

GIVEN a document translated to English and Japanese
WHEN checking result files
THEN both xxx_translated_en.json and xxx_translated_ja.json exist
AND original xxx_result.json is unchanged

Requirement: Language Support

The system SHALL support common languages through DIFY AI service.

Scenario: Common language translation

GIVEN target language is English, Chinese, Japanese, or Korean
WHEN translation is requested
THEN the system includes appropriate language name in prompt
AND executes translation successfully

Scenario: Automatic source language detection

GIVEN source_lang is set to "auto"
WHEN translation is executed
THEN the AI model automatically detects source language
AND translates to target language

Scenario: Supported languages list

GIVEN user queries supported languages
WHEN checking language support
THEN the system provides list including:
- English (en)
- Traditional Chinese (zh-TW)
- Simplified Chinese (zh-CN)
- Japanese (ja)
- Korean (ko)
- German (de)
- French (fr)
- Spanish (es)
- Portuguese (pt)
- Italian (it)
- Russian (ru)
- Vietnamese (vi)
- Thai (th)

Requirement: Translated PDF Generation

The system SHALL support generating PDF files with translated content while preserving the original document layout.

Scenario: Generate translated PDF from Direct track document

GIVEN a completed translation for a Direct track processed document
WHEN user requests translated PDF via POST /api/v2/translate/{task_id}/pdf?lang={target_lang}
THEN the system loads the translation JSON file
AND merges translations with UnifiedDocument by element_id
AND generates PDF with translated text at original positions
AND returns PDF file with Content-Type application/pdf

Scenario: Generate translated PDF from OCR track document

GIVEN a completed translation for an OCR track processed document
WHEN user requests translated PDF
THEN the system generates PDF preserving all OCR layout information
AND replaces original text with translated content
AND maintains table structure with translated cell content

Scenario: Handle missing translations gracefully

GIVEN a translation JSON missing some element_id entries
WHEN generating translated PDF
THEN the system uses original content for missing translations
AND logs warning for each fallback
AND completes PDF generation successfully

Scenario: Translated PDF for incomplete translation

GIVEN a task with translation status "pending" or "translating"
WHEN user requests translated PDF
THEN the system returns 400 Bad Request
AND includes error message indicating translation not complete

Scenario: Translated PDF for non-existent translation

GIVEN a task that has not been translated to requested language
WHEN user requests translated PDF with lang=fr
THEN the system returns 404 Not Found
AND includes error message indicating no translation for language

Requirement: Translation Merge Service

The system SHALL provide a service to merge translation data with UnifiedDocument.

Scenario: Merge text element translations

GIVEN a UnifiedDocument with text elements
AND a translation JSON with matching element_ids
WHEN applying translations
THEN the system replaces content field for each matched element
AND preserves all other element properties (bounding_box, style_info, etc.)

Scenario: Merge table cell translations

GIVEN a UnifiedDocument containing table elements

AND a translation JSON with table_cell translations like:

{
  "table_1_0": {
    "cells": [{"row": 0, "col": 0, "content": "Translated"}]
  }
}

WHEN applying translations
THEN the system updates cell content at matching row/col positions
AND preserves cell structure and styling

Scenario: Non-destructive merge operation

GIVEN a UnifiedDocument
WHEN applying translations
THEN the system creates a modified copy
AND original UnifiedDocument remains unchanged

Requirement: Translation Output as Reflow PDF

The system SHALL generate translated documents as reflow-layout PDFs with real visible text, separate from the Layout PDF which uses background images.

Scenario: Generate translated PDF with reflow layout

WHEN translation is completed for a document
THEN the system SHALL generate a new PDF with translated text
AND the translated PDF SHALL use reflow layout (not background image)
AND text SHALL be real visible text, not invisible overlay
AND page breaks SHALL correspond to original document pages

Scenario: Maintain page correspondence in translated output

WHEN generating translated PDF
THEN content from original page 1 SHALL appear in translated page 1
AND content from original page 2 SHALL appear in translated page 2
AND each page may have different content length but maintains page boundaries

Scenario: Chart text excluded from translation

WHEN extracting text for translation from Direct Track documents
THEN text elements within chart regions SHALL NOT be included
AND chart labels, axis text, and legends SHALL remain untranslated
AND this is expected behavior documented for users

Requirement: Dual PDF Output Concept

The system SHALL maintain clear separation between Layout PDF (preview) and Translated PDF (output).

Scenario: Layout PDF for preview

WHEN user views a processed document before translation
THEN the Layout PDF SHALL be displayed
AND Layout PDF preserves exact visual appearance of source
AND text is invisible overlay for extraction purposes only

Scenario: Translated PDF for final output

WHEN user requests translated document
THEN the Translated PDF SHALL be generated
AND Translated PDF uses reflow layout with visible translated text
AND original visual styling is not preserved (text-focused output)

Scenario: Both PDFs available after translation

WHEN translation is completed
THEN both Layout PDF and Translated PDF SHALL be available for download
AND user can choose which version to download
AND Layout PDF remains unchanged after translation

12 KiB Raw Blame History