feat: unify Direct Track PDF rendering and simplify export options
Backend changes: - Apply background image + invisible text layer to all Direct Track PDFs - Add CHART to regions_to_avoid for text extraction - Improve visual fidelity for native PDFs and Office documents Frontend changes: - Remove JSON, UnifiedDocument, Markdown download buttons - Simplify to 2-column layout with only Layout PDF and Reflow PDF - Remove translation JSON download and Layout PDF option - Keep only Reflow PDF for translated document downloads - Clean up unused imports (FileJson, Database, FileOutput) Archives two OpenSpec proposals: - unify-direct-track-pdf-rendering - simplify-frontend-export-options 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,130 @@
|
||||
# Design: Unify Direct Track PDF Rendering
|
||||
|
||||
## Context
|
||||
|
||||
The Tool_OCR system generates "Layout PDF" files that preserve the original document appearance while maintaining extractable text. Currently, Direct Track (editable PDFs and Office documents) uses element-by-element rendering, which causes:
|
||||
- Z-order conflicts (text behind images)
|
||||
- Missing vector graphics (chart bars, gradients)
|
||||
- White text becoming invisible on dark backgrounds
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
### Goals
|
||||
- Visual fidelity: Layout PDF matches source document exactly
|
||||
- Text extractability: All text remains searchable/selectable for translation
|
||||
- Unified logic: Same rendering approach for all Direct Track documents
|
||||
- Chart handling: Chart-internal text excluded from translation layer
|
||||
|
||||
### Non-Goals
|
||||
- Editable text in Layout PDF (translation creates separate reflow PDF)
|
||||
- Reducing file size (trade-off for visual fidelity)
|
||||
- OCR Track changes (only affects Direct Track)
|
||||
|
||||
## Decisions
|
||||
|
||||
### Decision 1: Use Background Image + Invisible Text Layer
|
||||
|
||||
**What**: Render each source PDF page as a full-page background image, then overlay invisible text.
|
||||
|
||||
**Why**:
|
||||
- Preserves ALL visual content (vector graphics, gradients, complex layouts)
|
||||
- Invisible text (PDF Rendering Mode 3) allows text selection without visual overlap
|
||||
- Simplifies z-order handling (just one image layer + one text layer)
|
||||
|
||||
**Implementation**:
|
||||
```python
|
||||
# Render source page as background
|
||||
mat = fitz.Matrix(2.0, 2.0) # 2x resolution
|
||||
pix = source_page.get_pixmap(matrix=mat, alpha=False)
|
||||
pdf_canvas.drawImage(bg_img, 0, 0, width=page_width, height=page_height)
|
||||
|
||||
# Set invisible text mode
|
||||
pdf_canvas._code.append('3 Tr') # Text render mode: invisible
|
||||
|
||||
# Draw text elements (invisible but selectable)
|
||||
for elem in text_elements:
|
||||
if not is_inside_chart_region(elem):
|
||||
draw_text_element(elem)
|
||||
|
||||
pdf_canvas._code.append('0 Tr') # Reset to normal
|
||||
```
|
||||
|
||||
### Decision 2: Add CHART to regions_to_avoid
|
||||
|
||||
**What**: Chart-internal text elements are excluded from the invisible text layer.
|
||||
|
||||
**Why**:
|
||||
- Chart axis labels, legends already visible in background image
|
||||
- These texts typically don't need translation
|
||||
- Prevents duplicate text extraction for translation
|
||||
|
||||
**Implementation**:
|
||||
```python
|
||||
# In element classification loop
|
||||
if element.type == ElementType.CHART:
|
||||
image_elements.append(element)
|
||||
regions_to_avoid.append(element) # Exclude chart region from text layer
|
||||
```
|
||||
|
||||
### Decision 3: Apply to ALL Direct Track Documents
|
||||
|
||||
**What**: Use background image rendering for both Office documents and native PDFs.
|
||||
|
||||
**Why**:
|
||||
- Consistent handling eliminates edge cases
|
||||
- Chart text overlap affects both document types
|
||||
- Office detection (LibreOffice producer) is unreliable for some PDFs
|
||||
|
||||
**Detection logic removed**:
|
||||
```python
|
||||
# OLD: Only for Office documents
|
||||
is_office_document = 'LibreOffice' in producer or filename.endswith('.pptx')
|
||||
|
||||
# NEW: All Direct Track uses background rendering
|
||||
if self.current_processing_track == ProcessingTrack.DIRECT:
|
||||
render_background_image()
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ PDF Generation Flow │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Source PDF ──► PyMuPDF ──► Page Pixmap (2x) ──► Background │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ Extract Text ──► Filter Chart Regions │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ Invisible Text Layer (Mode 3) ──► Overlay │
|
||||
│ │
|
||||
│ Result: Background Image + Invisible Searchable Text │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
| Risk | Impact | Mitigation |
|
||||
|------|--------|------------|
|
||||
| Larger file size (~2MB/page) | Storage, download time | Accept trade-off for visual fidelity |
|
||||
| Slightly slower generation | User wait time | Acceptable for quality improvement |
|
||||
| Chart text not translatable | Feature limitation | Document as expected behavior |
|
||||
| Source PDF required | Can't regenerate without source | Store source PDF reference in task |
|
||||
|
||||
## File Size Estimation
|
||||
|
||||
| Document | Pages | Current Size | New Size (est.) |
|
||||
|----------|-------|--------------|-----------------|
|
||||
| PPT (25 pages) | 25 | ~1.5 MB | ~43 MB |
|
||||
| PDF (3 pages) | 3 | ~68 KB | ~6 MB |
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. Should we provide a "lightweight" option that skips background rendering for simple PDFs?
|
||||
- **Decision**: No, keep unified approach for consistency
|
||||
|
||||
2. Should chart text be optionally included in translation?
|
||||
- **Decision**: No, chart labels rarely need translation and would require complex masking
|
||||
Reference in New Issue
Block a user