Files
OCR/openspec/changes/improve-translated-text-fitting/proposal.md
egg 08adf3d01d feat: add translated PDF format selection (layout/reflow)
- Add generate_translated_layout_pdf() method for layout-preserving translated PDFs
- Add generate_translated_pdf() method for reflow translated PDFs
- Update translate router to accept format parameter (layout/reflow)
- Update frontend with dropdown to select translated PDF format
- Fix reflow PDF table cell extraction from content dict
- Add embedded images handling in reflow PDF tables
- Archive improve-translated-text-fitting openspec proposal

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 10:10:28 +08:00

2.0 KiB
Raw Blame History

Change: Reflow Layout PDF Export for All Tracks

Why

When generating translated PDFs, text often doesn't fit within original bounding boxes due to language expansion/contraction differences. Additionally, users may want a readable flowing document format even without translation.

Example from task c79df0ad-f9a6-4c04-8139-13eaef25fa83:

  • Original Chinese: "华天科技(宝鸡)有限公司设备版块报价单" (19 characters)
  • Translated English: "Huatian Technology (Baoji) Co., Ltd. Equipment Division Quotation" (65+ characters)
  • Same bounding box: 703×111 pixels
  • Current result: Font reduced to minimum (3pt), text unreadable

What Changes

  • NEW: Add reflow layout PDF generation for both OCR and Direct tracks
  • Preserve semantic structure (headings, tables, lists) in reflow mode
  • Use consistent, readable font sizes (12pt body, 16pt headings)
  • Embed images inline within flowing content
  • IMPORTANT: Original layout preservation PDF generation remains unchanged
  • Support both tracks with proper reading order:
    • OCR track: Use existing reading_order array from PP-StructureV3
    • Direct track: Use PyMuPDF's implicit order (with option for column detection)
  • FIX: Remove outdated MADLAD-400 references from frontend (now uses Dify cloud translation)

Download Options

Scenario Layout PDF Reflow PDF
Without Translation Available Available (NEW)
With Translation - Available (single option, unchanged)

Impact

  • Affected specs: specs/result-export/spec.md
  • Affected code:
    • backend/app/services/pdf_generator_service.py - add reflow generation method
    • backend/app/routers/tasks.py - add reflow PDF download endpoint
    • backend/app/routers/translate.py - use reflow mode for translated PDFs
    • frontend/src/pages/TaskDetailPage.tsx:
      • Add "Download Reflow PDF" button for original documents
      • Remove MADLAD-400 badge and outdated description text