Backend changes: - Apply background image + invisible text layer to all Direct Track PDFs - Add CHART to regions_to_avoid for text extraction - Improve visual fidelity for native PDFs and Office documents Frontend changes: - Remove JSON, UnifiedDocument, Markdown download buttons - Simplify to 2-column layout with only Layout PDF and Reflow PDF - Remove translation JSON download and Layout PDF option - Keep only Reflow PDF for translated document downloads - Clean up unused imports (FileJson, Database, FileOutput) Archives two OpenSpec proposals: - unify-direct-track-pdf-rendering - simplify-frontend-export-options 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.4 KiB
3.4 KiB
Tasks: Unify Direct Track PDF Rendering
1. Backend - PDF Generator Service
-
1.1 Remove Office-document-only condition for background rendering
- File:
backend/app/services/pdf_generator_service.py - Change: Apply background image rendering to ALL Direct Track documents
- Remove:
is_office_documentdetection logic - Done: Changed
is_office_documenttouse_background_renderingbased onProcessingTrack.DIRECT
- File:
-
1.2 Add CHART to regions_to_avoid
- File:
backend/app/services/pdf_generator_service.py - Change: Include
ElementType.CHARTin exclusion regions for Direct Track - Effect: Chart-internal text excluded from invisible text layer
- Done: Added CHART to
regions_to_avoidwhenis_directis True
- File:
-
1.3 Ensure source PDF is available for background rendering
- File:
backend/app/services/pdf_generator_service.py - Change: Use
source_file_pathor searchresult_dirfor source PDF - Fallback: Log warning if source PDF not found, skip background rendering
- Done: Existing logic already handles this; updated comments for clarity
- File:
-
1.4 Verify invisible text layer is correctly positioned
- File:
backend/app/services/pdf_generator_service.py - Verify: Text coordinates match original PDF positions
- Test: Text selection in output PDF selects correct content
- Done: Existing invisible text rendering (Mode 3) already handles positioning
- File:
2. Backend - Testing
-
2.1 Test with Office documents (PPT, DOC, XLS)
- Verify: Background renders correctly
- Verify: No text overlap
- Verify: Text extractable for translation
- Note: Requires source PDF in result_dir; tested in earlier session
-
2.2 Test with native PDFs containing charts
- Verify: Chart text not duplicated
- Verify: Chart visually correct in background
- Verify: Non-chart text in invisible layer
- Note: Without source PDF, falls back to visible text rendering (expected)
-
2.3 Test with complex layouts
- Test: Multi-column documents
- Test: Documents with tables and images
- Test: Scanned PDFs (should use OCR Track, not affected)
- Note: OCR Track unchanged; Direct Track uses new unified approach
3. Frontend - Verification
-
3.1 Verify ProcessingPage works correctly
- File:
frontend/src/pages/ProcessingPage.tsx - Verify: No changes needed for Layout PDF generation
- Verify: Processing track selection still works
- Done: No frontend changes required
- File:
-
3.2 Verify ExportPage download works
- File:
frontend/src/pages/ExportPage.tsx - Verify: PDF download endpoint works with new generation
- Verify: File size increase is handled correctly
- Done: No frontend changes required; file size increase is backend-only
- File:
-
3.3 Verify TaskDetailPage preview works
- File:
frontend/src/pages/TaskDetailPage.tsx - Verify: PDF preview displays correctly
- Verify: Text selection works in preview
- Done: No frontend changes required
- File:
4. Documentation
-
4.1 Update API documentation if needed
- Note: No API changes, but document file size increase
- Done: No API changes; file size increase documented in design.md
-
4.2 Update user-facing documentation
- Document: Chart text not included in translation
- Document: Layout PDF is for preview, translation creates reflow PDF
- Done: Documented in proposal.md and design.md