Files
egg 24253ac15e feat: unify Direct Track PDF rendering and simplify export options
Backend changes:
- Apply background image + invisible text layer to all Direct Track PDFs
- Add CHART to regions_to_avoid for text extraction
- Improve visual fidelity for native PDFs and Office documents

Frontend changes:
- Remove JSON, UnifiedDocument, Markdown download buttons
- Simplify to 2-column layout with only Layout PDF and Reflow PDF
- Remove translation JSON download and Layout PDF option
- Keep only Reflow PDF for translated document downloads
- Clean up unused imports (FileJson, Database, FileOutput)

Archives two OpenSpec proposals:
- unify-direct-track-pdf-rendering
- simplify-frontend-export-options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 07:50:43 +08:00

2.1 KiB

MODIFIED Requirements

Requirement: Enhanced PDF Export with Layout Preservation

The PDF export SHALL accurately preserve document layout from both OCR and direct extraction tracks with correct coordinate transformation and multi-page support. For Direct Track, a background image rendering approach SHALL be used for visual fidelity.

Scenario: Export PDF from direct extraction track

  • WHEN exporting PDF from a direct-extraction processed document
  • THEN the system SHALL render source PDF pages as full-page background images at 2x resolution
  • AND overlay invisible text elements using PDF Text Rendering Mode 3
  • AND text SHALL remain selectable and searchable despite being invisible
  • AND visual output SHALL match source document exactly

Scenario: Export PDF from OCR track with full structure

  • WHEN exporting PDF from OCR-processed document
  • THEN the PDF SHALL use all 23 PP-StructureV3 element types
  • AND render tables with proper cell boundaries
  • AND maintain reading order from parsing_res_list

Scenario: Handle coordinate transformations correctly

  • WHEN generating PDF from UnifiedDocument
  • THEN system SHALL use explicit page dimensions from OCR results (not inferred from bounding boxes)
  • AND correctly transform Y-axis coordinates from top-left (OCR) to bottom-left (PDF/ReportLab) origin
  • AND prevent vertical flipping or position misalignment errors

Scenario: Direct Track PDF file size increase

  • WHEN generating Layout PDF for Direct Track documents
  • THEN the system SHALL accept increased file size due to embedded page images
  • AND approximately 1-2 MB per page at 2x resolution is expected
  • AND this trade-off is accepted for improved visual fidelity

Scenario: Chart elements excluded from text layer

  • WHEN generating Layout PDF containing charts
  • THEN the system SHALL NOT include chart-internal text in the invisible text layer
  • AND chart visuals SHALL be preserved in the background image
  • AND chart text SHALL NOT be available for text selection or translation