OCR/openspec/changes/archive/2025-12-11-unify-direct-track-pdf-rendering/design.md

# Design: Unify Direct Track PDF Rendering

## Context

The Tool_OCR system generates "Layout PDF" files that preserve the original document appearance while maintaining extractable text. Currently, Direct Track (editable PDFs and Office documents) uses element-by-element rendering, which causes:
- Z-order conflicts (text behind images)
- Missing vector graphics (chart bars, gradients)
- White text becoming invisible on dark backgrounds

## Goals / Non-Goals

### Goals
- Visual fidelity: Layout PDF matches source document exactly
- Text extractability: All text remains searchable/selectable for translation
- Unified logic: Same rendering approach for all Direct Track documents
- Chart handling: Chart-internal text excluded from translation layer

### Non-Goals
- Editable text in Layout PDF (translation creates separate reflow PDF)
- Reducing file size (trade-off for visual fidelity)
- OCR Track changes (only affects Direct Track)

## Decisions

### Decision 1: Use Background Image + Invisible Text Layer

**What**: Render each source PDF page as a full-page background image, then overlay invisible text.

**Why**:
- Preserves ALL visual content (vector graphics, gradients, complex layouts)
- Invisible text (PDF Rendering Mode 3) allows text selection without visual overlap
- Simplifies z-order handling (just one image layer + one text layer)

**Implementation**:
```python
# Render source page as background
mat = fitz.Matrix(2.0, 2.0)  # 2x resolution
pix = source_page.get_pixmap(matrix=mat, alpha=False)
pdf_canvas.drawImage(bg_img, 0, 0, width=page_width, height=page_height)

# Set invisible text mode
pdf_canvas._code.append('3 Tr')  # Text render mode: invisible

# Draw text elements (invisible but selectable)
for elem in text_elements:
    if not is_inside_chart_region(elem):
        draw_text_element(elem)

pdf_canvas._code.append('0 Tr')  # Reset to normal
```

### Decision 2: Add CHART to regions_to_avoid

**What**: Chart-internal text elements are excluded from the invisible text layer.

**Why**:
- Chart axis labels, legends already visible in background image
- These texts typically don't need translation
- Prevents duplicate text extraction for translation

**Implementation**:
```python
# In element classification loop
if element.type == ElementType.CHART:
    image_elements.append(element)
    regions_to_avoid.append(element)  # Exclude chart region from text layer
```

### Decision 3: Apply to ALL Direct Track Documents

**What**: Use background image rendering for both Office documents and native PDFs.

**Why**:
- Consistent handling eliminates edge cases
- Chart text overlap affects both document types
- Office detection (LibreOffice producer) is unreliable for some PDFs

**Detection logic removed**:
```python
# OLD: Only for Office documents
is_office_document = 'LibreOffice' in producer or filename.endswith('.pptx')

# NEW: All Direct Track uses background rendering
if self.current_processing_track == ProcessingTrack.DIRECT:
    render_background_image()
```

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                    PDF Generation Flow                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Source PDF ──► PyMuPDF ──► Page Pixmap (2x) ──► Background │
│                    │                                         │
│                    ▼                                         │
│              Extract Text ──► Filter Chart Regions           │
│                    │                                         │
│                    ▼                                         │
│         Invisible Text Layer (Mode 3) ──► Overlay            │
│                                                              │
│  Result: Background Image + Invisible Searchable Text        │
│                                                              │
└─────────────────────────────────────────────────────────────┘
```

## Risks / Trade-offs

| Risk | Impact | Mitigation |
|------|--------|------------|
| Larger file size (~2MB/page) | Storage, download time | Accept trade-off for visual fidelity |
| Slightly slower generation | User wait time | Acceptable for quality improvement |
| Chart text not translatable | Feature limitation | Document as expected behavior |
| Source PDF required | Can't regenerate without source | Store source PDF reference in task |

## File Size Estimation

| Document | Pages | Current Size | New Size (est.) |
|----------|-------|--------------|-----------------|
| PPT (25 pages) | 25 | ~1.5 MB | ~43 MB |
| PDF (3 pages) | 3 | ~68 KB | ~6 MB |

## Open Questions

1. Should we provide a "lightweight" option that skips background rendering for simple PDFs?
   - **Decision**: No, keep unified approach for consistency

2. Should chart text be optionally included in translation?
   - **Decision**: No, chart labels rarely need translation and would require complex masking