feat: add translated PDF format selection (layout/reflow)

- Add generate_translated_layout_pdf() method for layout-preserving translated PDFs
- Add generate_translated_pdf() method for reflow translated PDFs
- Update translate router to accept format parameter (layout/reflow)
- Update frontend with dropdown to select translated PDF format
- Fix reflow PDF table cell extraction from content dict
- Add embedded images handling in reflow PDF tables
- Archive improve-translated-text-fitting openspec proposal

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-12-03 10:10:28 +08:00
parent 0dcea4a7e7
commit 08adf3d01d
15 changed files with 1384 additions and 1222 deletions

View File

@@ -0,0 +1,41 @@
# Change: Reflow Layout PDF Export for All Tracks
## Why
When generating translated PDFs, text often doesn't fit within original bounding boxes due to language expansion/contraction differences. Additionally, users may want a readable flowing document format even without translation.
**Example from task c79df0ad-f9a6-4c04-8139-13eaef25fa83:**
- Original Chinese: "华天科技(宝鸡)有限公司设备版块报价单" (19 characters)
- Translated English: "Huatian Technology (Baoji) Co., Ltd. Equipment Division Quotation" (65+ characters)
- Same bounding box: 703×111 pixels
- Current result: Font reduced to minimum (3pt), text unreadable
## What Changes
- **NEW**: Add reflow layout PDF generation for both OCR and Direct tracks
- Preserve semantic structure (headings, tables, lists) in reflow mode
- Use consistent, readable font sizes (12pt body, 16pt headings)
- Embed images inline within flowing content
- **IMPORTANT**: Original layout preservation PDF generation remains unchanged
- Support both tracks with proper reading order:
- **OCR track**: Use existing `reading_order` array from PP-StructureV3
- **Direct track**: Use PyMuPDF's implicit order (with option for column detection)
- **FIX**: Remove outdated MADLAD-400 references from frontend (now uses Dify cloud translation)
## Download Options
| Scenario | Layout PDF | Reflow PDF |
|----------|------------|------------|
| **Without Translation** | Available | Available (NEW) |
| **With Translation** | - | Available (single option, unchanged) |
## Impact
- Affected specs: `specs/result-export/spec.md`
- Affected code:
- `backend/app/services/pdf_generator_service.py` - add reflow generation method
- `backend/app/routers/tasks.py` - add reflow PDF download endpoint
- `backend/app/routers/translate.py` - use reflow mode for translated PDFs
- `frontend/src/pages/TaskDetailPage.tsx`:
- Add "Download Reflow PDF" button for original documents
- Remove MADLAD-400 badge and outdated description text