feat: add translated PDF format selection (layout/reflow)
- Add generate_translated_layout_pdf() method for layout-preserving translated PDFs - Add generate_translated_pdf() method for reflow translated PDFs - Update translate router to accept format parameter (layout/reflow) - Update frontend with dropdown to select translated PDF format - Fix reflow PDF table cell extraction from content dict - Add embedded images handling in reflow PDF tables - Archive improve-translated-text-fitting openspec proposal 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
167
openspec/changes/improve-translated-text-fitting/design.md
Normal file
167
openspec/changes/improve-translated-text-fitting/design.md
Normal file
@@ -0,0 +1,167 @@
|
||||
## Context
|
||||
|
||||
The PDF generator currently uses layout preservation mode for all PDF output, placing text at original coordinates. This works for document reconstruction but:
|
||||
1. Fails for translated content where text length differs significantly
|
||||
2. May not provide the best reading experience for flowing documents
|
||||
|
||||
Two PDF generation modes are needed:
|
||||
1. **Layout Preservation** (existing): Maintains original coordinates
|
||||
2. **Reflow Layout** (new): Prioritizes readability with flowing content
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
- Translated and non-translated documents can use reflow layout
|
||||
- Both OCR and Direct tracks supported
|
||||
- Proper reading order preserved using available data
|
||||
- Consistent font sizes for readability
|
||||
- Images and tables embedded inline
|
||||
|
||||
**Non-Goals:**
|
||||
- Perfect visual matching with original document layout
|
||||
- Complex multi-column reflow (simple single-column flow)
|
||||
- Font style matching from original document
|
||||
|
||||
## Decisions
|
||||
|
||||
### Decision 1: Reading Order Strategy
|
||||
|
||||
| Track | Reading Order Source | Implementation |
|
||||
|-------|---------------------|----------------|
|
||||
| **OCR** | Explicit `reading_order` array in JSON | Use array indices to order elements |
|
||||
| **Direct** | Implicit in element list order | Use list iteration order (PyMuPDF sort=True) |
|
||||
|
||||
**OCR Track - reading_order array:**
|
||||
```json
|
||||
{
|
||||
"pages": [{
|
||||
"reading_order": [0, 1, 2, 3, 6, 7, 8, ...],
|
||||
"elements": [...]
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
**Direct Track - implicit order:**
|
||||
- PyMuPDF's `get_text("dict", sort=True)` provides spatial reading order
|
||||
- Elements already sorted by extraction engine
|
||||
- Optional: Enable `_sort_elements_for_reading_order()` for multi-column detection
|
||||
|
||||
### Decision 2: Separate API Endpoints
|
||||
|
||||
```
|
||||
# Layout preservation (existing)
|
||||
GET /api/v2/tasks/{task_id}/download/pdf
|
||||
|
||||
# Reflow layout (new)
|
||||
GET /api/v2/tasks/{task_id}/download/pdf?format=reflow
|
||||
|
||||
# Translated PDF (reflow only)
|
||||
POST /api/v2/translate/{task_id}/pdf?lang={lang}
|
||||
```
|
||||
|
||||
### Decision 3: Unified Reflow Generation Method
|
||||
|
||||
```python
|
||||
def generate_reflow_pdf(
|
||||
self,
|
||||
result_json_path: Path,
|
||||
output_path: Path,
|
||||
translation_json_path: Optional[Path] = None, # None = no translation
|
||||
source_file_path: Optional[Path] = None, # For embedded images
|
||||
) -> bool:
|
||||
"""
|
||||
Generate reflow layout PDF for either OCR or Direct track.
|
||||
Works with or without translation.
|
||||
"""
|
||||
```
|
||||
|
||||
### Decision 4: Reading Order Application
|
||||
|
||||
```python
|
||||
def _get_elements_in_reading_order(self, page_data: dict) -> List[dict]:
|
||||
"""Get elements sorted by reading order."""
|
||||
elements = page_data.get('elements', [])
|
||||
reading_order = page_data.get('reading_order')
|
||||
|
||||
if reading_order:
|
||||
# OCR track: use explicit reading order
|
||||
ordered = []
|
||||
for idx in reading_order:
|
||||
if 0 <= idx < len(elements):
|
||||
ordered.append(elements[idx])
|
||||
return ordered
|
||||
else:
|
||||
# Direct track: elements already in reading order
|
||||
return elements
|
||||
```
|
||||
|
||||
### Decision 5: Consistent Typography
|
||||
|
||||
| Element Type | Font Size | Style |
|
||||
|-------------|-----------|-------|
|
||||
| Title/H1 | 18pt | Bold |
|
||||
| H2 | 16pt | Bold |
|
||||
| H3 | 14pt | Bold |
|
||||
| Body text | 12pt | Normal|
|
||||
| Table cell | 10pt | Normal|
|
||||
| Caption | 10pt | Italic|
|
||||
|
||||
### Decision 6: Table Handling in Reflow
|
||||
|
||||
Tables use Platypus Table with auto-width columns:
|
||||
|
||||
```python
|
||||
def _create_reflow_table(self, table_data, translations=None):
|
||||
data = []
|
||||
for row in table_data['rows']:
|
||||
row_data = []
|
||||
for cell in row['cells']:
|
||||
text = cell.get('text', '')
|
||||
if translations:
|
||||
text = translations.get(cell.get('id'), text)
|
||||
row_data.append(Paragraph(text, self.styles['TableCell']))
|
||||
data.append(row_data)
|
||||
|
||||
table = Table(data)
|
||||
table.setStyle(TableStyle([
|
||||
('GRID', (0, 0), (-1, -1), 0.5, colors.black),
|
||||
('VALIGN', (0, 0), (-1, -1), 'TOP'),
|
||||
('PADDING', (0, 0), (-1, -1), 6),
|
||||
]))
|
||||
return table
|
||||
```
|
||||
|
||||
### Decision 7: Image Embedding
|
||||
|
||||
```python
|
||||
def _embed_image_reflow(self, element, max_width=450):
|
||||
img_path = self._resolve_image_path(element)
|
||||
if img_path and img_path.exists():
|
||||
img = Image(str(img_path))
|
||||
# Scale to fit page width
|
||||
if img.drawWidth > max_width:
|
||||
ratio = max_width / img.drawWidth
|
||||
img.drawWidth = max_width
|
||||
img.drawHeight *= ratio
|
||||
return img
|
||||
return Spacer(1, 0)
|
||||
```
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
- **Risk**: OCR reading_order may not be accurate for complex layouts
|
||||
- **Mitigation**: Falls back to spatial sort if reading_order missing
|
||||
|
||||
- **Risk**: Direct track multi-column detection unused
|
||||
- **Mitigation**: PyMuPDF sort=True is generally reliable
|
||||
|
||||
- **Risk**: Loss of visual fidelity compared to original
|
||||
- **Mitigation**: This is acceptable; layout PDF still available
|
||||
|
||||
## Migration Plan
|
||||
|
||||
No migration needed - new functionality, existing behavior unchanged.
|
||||
|
||||
## Open Questions
|
||||
|
||||
None - design confirmed with user.
|
||||
41
openspec/changes/improve-translated-text-fitting/proposal.md
Normal file
41
openspec/changes/improve-translated-text-fitting/proposal.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# Change: Reflow Layout PDF Export for All Tracks
|
||||
|
||||
## Why
|
||||
|
||||
When generating translated PDFs, text often doesn't fit within original bounding boxes due to language expansion/contraction differences. Additionally, users may want a readable flowing document format even without translation.
|
||||
|
||||
**Example from task c79df0ad-f9a6-4c04-8139-13eaef25fa83:**
|
||||
- Original Chinese: "华天科技(宝鸡)有限公司设备版块报价单" (19 characters)
|
||||
- Translated English: "Huatian Technology (Baoji) Co., Ltd. Equipment Division Quotation" (65+ characters)
|
||||
- Same bounding box: 703×111 pixels
|
||||
- Current result: Font reduced to minimum (3pt), text unreadable
|
||||
|
||||
## What Changes
|
||||
|
||||
- **NEW**: Add reflow layout PDF generation for both OCR and Direct tracks
|
||||
- Preserve semantic structure (headings, tables, lists) in reflow mode
|
||||
- Use consistent, readable font sizes (12pt body, 16pt headings)
|
||||
- Embed images inline within flowing content
|
||||
- **IMPORTANT**: Original layout preservation PDF generation remains unchanged
|
||||
- Support both tracks with proper reading order:
|
||||
- **OCR track**: Use existing `reading_order` array from PP-StructureV3
|
||||
- **Direct track**: Use PyMuPDF's implicit order (with option for column detection)
|
||||
- **FIX**: Remove outdated MADLAD-400 references from frontend (now uses Dify cloud translation)
|
||||
|
||||
## Download Options
|
||||
|
||||
| Scenario | Layout PDF | Reflow PDF |
|
||||
|----------|------------|------------|
|
||||
| **Without Translation** | Available | Available (NEW) |
|
||||
| **With Translation** | - | Available (single option, unchanged) |
|
||||
|
||||
## Impact
|
||||
|
||||
- Affected specs: `specs/result-export/spec.md`
|
||||
- Affected code:
|
||||
- `backend/app/services/pdf_generator_service.py` - add reflow generation method
|
||||
- `backend/app/routers/tasks.py` - add reflow PDF download endpoint
|
||||
- `backend/app/routers/translate.py` - use reflow mode for translated PDFs
|
||||
- `frontend/src/pages/TaskDetailPage.tsx`:
|
||||
- Add "Download Reflow PDF" button for original documents
|
||||
- Remove MADLAD-400 badge and outdated description text
|
||||
@@ -0,0 +1,137 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Dual PDF Generation Modes
|
||||
|
||||
The system SHALL support two distinct PDF generation modes to serve different use cases for both OCR and Direct tracks.
|
||||
|
||||
#### Scenario: Download layout preservation PDF
|
||||
- **WHEN** user requests PDF via `/api/v2/tasks/{task_id}/download/pdf`
|
||||
- **THEN** PDF SHALL use layout preservation mode
|
||||
- **AND** text positions SHALL match original document coordinates
|
||||
- **AND** this option SHALL be available for both OCR and Direct tracks
|
||||
- **AND** existing behavior SHALL remain unchanged
|
||||
|
||||
#### Scenario: Download reflow layout PDF without translation
|
||||
- **WHEN** user requests PDF via `/api/v2/tasks/{task_id}/download/pdf?format=reflow`
|
||||
- **THEN** PDF SHALL use reflow layout mode
|
||||
- **AND** text SHALL flow naturally with consistent font sizes
|
||||
- **AND** body text SHALL use approximately 12pt font size
|
||||
- **AND** headings SHALL use larger font sizes (14-18pt)
|
||||
- **AND** this option SHALL be available for both OCR and Direct tracks
|
||||
|
||||
#### Scenario: OCR track reading order in reflow mode
|
||||
- **GIVEN** document processed via OCR track
|
||||
- **WHEN** generating reflow PDF
|
||||
- **THEN** system SHALL use explicit `reading_order` array from JSON
|
||||
- **AND** elements SHALL appear in order specified by reading_order indices
|
||||
- **AND** if reading_order is missing, fall back to spatial sort (y, x)
|
||||
|
||||
#### Scenario: Direct track reading order in reflow mode
|
||||
- **GIVEN** document processed via Direct track
|
||||
- **WHEN** generating reflow PDF
|
||||
- **THEN** system SHALL use implicit element order from extraction
|
||||
- **AND** elements SHALL appear in list iteration order
|
||||
- **AND** PyMuPDF's sort=True ordering SHALL be trusted
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Reflow PDF Semantic Structure
|
||||
|
||||
The reflow PDF generation SHALL preserve document semantic structure.
|
||||
|
||||
#### Scenario: Headings in reflow mode
|
||||
- **WHEN** original document contains headings (title, h1, h2, etc.)
|
||||
- **THEN** headings SHALL be rendered with larger font sizes
|
||||
- **AND** headings SHALL be visually distinguished from body text
|
||||
- **AND** heading hierarchy SHALL be preserved
|
||||
|
||||
#### Scenario: Tables in reflow mode
|
||||
- **WHEN** original document contains tables
|
||||
- **THEN** tables SHALL render with visible cell borders
|
||||
- **AND** column widths SHALL auto-adjust to content
|
||||
- **AND** table content SHALL be fully visible
|
||||
- **AND** tables SHALL use appropriate cell padding
|
||||
|
||||
#### Scenario: Images in reflow mode
|
||||
- **WHEN** original document contains images
|
||||
- **THEN** images SHALL be embedded inline in flowing content
|
||||
- **AND** images SHALL be scaled to fit page width if necessary
|
||||
- **AND** images SHALL maintain aspect ratio
|
||||
|
||||
#### Scenario: Lists in reflow mode
|
||||
- **WHEN** original document contains numbered or bulleted lists
|
||||
- **THEN** lists SHALL preserve their formatting
|
||||
- **AND** list items SHALL flow naturally
|
||||
|
||||
---
|
||||
|
||||
## MODIFIED Requirements
|
||||
|
||||
### Requirement: Translated PDF Export API
|
||||
|
||||
The system SHALL expose an API endpoint for downloading translated documents as PDF files using reflow layout mode only.
|
||||
|
||||
#### Scenario: Download translated PDF via API
|
||||
- **GIVEN** a task with completed translation
|
||||
- **WHEN** POST request to `/api/v2/translate/{task_id}/pdf?lang={lang}`
|
||||
- **THEN** system returns PDF file with translated content
|
||||
- **AND** PDF SHALL use reflow layout mode (not layout preservation)
|
||||
- **AND** Content-Type is `application/pdf`
|
||||
- **AND** Content-Disposition suggests filename like `{task_id}_translated_{lang}.pdf`
|
||||
|
||||
#### Scenario: Translated PDF uses reflow layout
|
||||
- **WHEN** user downloads translated PDF
|
||||
- **THEN** the PDF SHALL use reflow layout mode
|
||||
- **AND** text SHALL flow naturally with consistent font sizes
|
||||
- **AND** body text SHALL use approximately 12pt font size
|
||||
- **AND** headings SHALL use larger font sizes (14-18pt)
|
||||
- **AND** content SHALL be readable without magnification
|
||||
|
||||
#### Scenario: Translated PDF for OCR track
|
||||
- **GIVEN** document processed via OCR track with translation
|
||||
- **WHEN** generating translated PDF
|
||||
- **THEN** reading order SHALL follow `reading_order` array
|
||||
- **AND** translated text SHALL replace original in correct positions
|
||||
|
||||
#### Scenario: Translated PDF for Direct track
|
||||
- **GIVEN** document processed via Direct track with translation
|
||||
- **WHEN** generating translated PDF
|
||||
- **THEN** reading order SHALL follow implicit element order
|
||||
- **AND** translated text SHALL replace original in correct positions
|
||||
|
||||
#### Scenario: Invalid language parameter
|
||||
- **GIVEN** a task with translation only to English
|
||||
- **WHEN** user requests PDF with `lang=ja` (Japanese)
|
||||
- **THEN** system returns 404 Not Found
|
||||
- **AND** response includes available languages in error message
|
||||
|
||||
#### Scenario: Task not found
|
||||
- **GIVEN** non-existent task_id
|
||||
- **WHEN** user requests translated PDF
|
||||
- **THEN** system returns 404 Not Found
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Frontend Download Options
|
||||
|
||||
The frontend SHALL provide appropriate download options based on translation status.
|
||||
|
||||
#### Scenario: Download options without translation
|
||||
- **GIVEN** a task without any completed translations
|
||||
- **WHEN** user views TaskDetailPage
|
||||
- **THEN** page SHALL display "Download Layout PDF" button (original coordinates)
|
||||
- **AND** page SHALL display "Download Reflow PDF" button (flowing layout)
|
||||
- **AND** both options SHALL be available in the download section
|
||||
|
||||
#### Scenario: Download options with translation
|
||||
- **GIVEN** a task with completed translation
|
||||
- **WHEN** user views TaskDetailPage
|
||||
- **THEN** page SHALL display "Download Translated PDF" button for each language
|
||||
- **AND** translated PDF button SHALL remain as single option (no Layout/Reflow choice)
|
||||
- **AND** translated PDF SHALL automatically use reflow layout
|
||||
|
||||
#### Scenario: Remove outdated MADLAD-400 references
|
||||
- **WHEN** displaying translation section
|
||||
- **THEN** page SHALL NOT display "MADLAD-400" badge
|
||||
- **AND** description text SHALL reflect cloud translation service (Dify)
|
||||
- **AND** description SHALL NOT mention local model loading time
|
||||
30
openspec/changes/improve-translated-text-fitting/tasks.md
Normal file
30
openspec/changes/improve-translated-text-fitting/tasks.md
Normal file
@@ -0,0 +1,30 @@
|
||||
## 1. Backend Implementation
|
||||
|
||||
- [x] 1.1 Create `generate_reflow_pdf()` method in pdf_generator_service.py
|
||||
- [x] 1.2 Implement `_get_elements_in_reading_order()` for both tracks
|
||||
- [x] 1.3 Implement reflow text rendering with consistent font sizes
|
||||
- [x] 1.4 Implement table rendering in reflow mode (Platypus Table)
|
||||
- [x] 1.5 Implement inline image embedding
|
||||
- [x] 1.6 Add `format=reflow` query parameter to tasks download endpoint
|
||||
- [x] 1.7 Update `generate_translated_pdf()` to use reflow mode
|
||||
|
||||
## 2. Frontend Implementation
|
||||
|
||||
- [x] 2.1 Add "Download Reflow PDF" button for original documents
|
||||
- [x] 2.2 Update download logic to support format parameter
|
||||
- [x] 2.3 Remove MADLAD-400 badge (line 545)
|
||||
- [x] 2.4 Update translation description text to reflect Dify cloud service (line 652)
|
||||
|
||||
## 3. Testing
|
||||
|
||||
- [x] 3.1 Test OCR track reflow PDF (with reading_order) - Basic smoke test passed
|
||||
- [ ] 3.2 Test Direct track reflow PDF (implicit order) - No test data available
|
||||
- [x] 3.3 Test translated PDF (reflow mode) - Basic smoke test passed
|
||||
- [x] 3.4 Test documents with tables - SUCCESS (62294 bytes, 2 tables)
|
||||
- [x] 3.5 Test documents with images - SUCCESS (embedded img_in_table)
|
||||
- [x] 3.6 Test multi-page documents - SUCCESS (11451 bytes, 3 pages)
|
||||
- [x] 3.7 Verify layout PDF still works correctly - SUCCESS (104543 bytes)
|
||||
|
||||
## 4. Documentation
|
||||
|
||||
- [x] 4.1 Update spec with reflow layout requirements
|
||||
Reference in New Issue
Block a user