feat: add translated PDF format selection (layout/reflow)
- Add generate_translated_layout_pdf() method for layout-preserving translated PDFs - Add generate_translated_pdf() method for reflow translated PDFs - Update translate router to accept format parameter (layout/reflow) - Update frontend with dropdown to select translated PDF format - Fix reflow PDF table cell extraction from content dict - Add embedded images handling in reflow PDF tables - Archive improve-translated-text-fitting openspec proposal 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
41
openspec/changes/improve-translated-text-fitting/proposal.md
Normal file
41
openspec/changes/improve-translated-text-fitting/proposal.md
Normal file
@@ -0,0 +1,41 @@
|
||||
# Change: Reflow Layout PDF Export for All Tracks
|
||||
|
||||
## Why
|
||||
|
||||
When generating translated PDFs, text often doesn't fit within original bounding boxes due to language expansion/contraction differences. Additionally, users may want a readable flowing document format even without translation.
|
||||
|
||||
**Example from task c79df0ad-f9a6-4c04-8139-13eaef25fa83:**
|
||||
- Original Chinese: "华天科技(宝鸡)有限公司设备版块报价单" (19 characters)
|
||||
- Translated English: "Huatian Technology (Baoji) Co., Ltd. Equipment Division Quotation" (65+ characters)
|
||||
- Same bounding box: 703×111 pixels
|
||||
- Current result: Font reduced to minimum (3pt), text unreadable
|
||||
|
||||
## What Changes
|
||||
|
||||
- **NEW**: Add reflow layout PDF generation for both OCR and Direct tracks
|
||||
- Preserve semantic structure (headings, tables, lists) in reflow mode
|
||||
- Use consistent, readable font sizes (12pt body, 16pt headings)
|
||||
- Embed images inline within flowing content
|
||||
- **IMPORTANT**: Original layout preservation PDF generation remains unchanged
|
||||
- Support both tracks with proper reading order:
|
||||
- **OCR track**: Use existing `reading_order` array from PP-StructureV3
|
||||
- **Direct track**: Use PyMuPDF's implicit order (with option for column detection)
|
||||
- **FIX**: Remove outdated MADLAD-400 references from frontend (now uses Dify cloud translation)
|
||||
|
||||
## Download Options
|
||||
|
||||
| Scenario | Layout PDF | Reflow PDF |
|
||||
|----------|------------|------------|
|
||||
| **Without Translation** | Available | Available (NEW) |
|
||||
| **With Translation** | - | Available (single option, unchanged) |
|
||||
|
||||
## Impact
|
||||
|
||||
- Affected specs: `specs/result-export/spec.md`
|
||||
- Affected code:
|
||||
- `backend/app/services/pdf_generator_service.py` - add reflow generation method
|
||||
- `backend/app/routers/tasks.py` - add reflow PDF download endpoint
|
||||
- `backend/app/routers/translate.py` - use reflow mode for translated PDFs
|
||||
- `frontend/src/pages/TaskDetailPage.tsx`:
|
||||
- Add "Download Reflow PDF" button for original documents
|
||||
- Remove MADLAD-400 badge and outdated description text
|
||||
Reference in New Issue
Block a user