Create new OpenSpec change proposal to fix critical PDF generation issues: **Problems Identified**: 1. Images never saved (empty _save_image implementation) 2. Image path mismatch (saved_path vs path lookup) 3. Tables never render (fake image dependency) 4. Text style completely lost (no font/color application) **Solution Design**: - Phase 1: Critical fixes (images, tables) - Phase 2: Basic style preservation - Phase 3: Advanced layout features - Phase 4: Testing and optimization **Key Improvements**: - Implement actual image saving in pp_structure_enhanced - Fix path resolution with fallback logic - Use table's own bbox instead of fake images - Track-specific rendering (rich for Direct, simple for OCR) - Preserve StyleInfo (fonts, sizes, colors) **Implementation Tasks**: - 10 major task groups - 4-week timeline - No breaking changes - Performance target: <10% overhead Proposal validated: openspec validate pdf-layout-restoration ✓ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
3.2 KiB
Result Export Specification
ADDED Requirements
Requirement: Layout-Preserving PDF Generation
The system MUST generate PDF files that preserve the original document layout including images, tables, and text formatting.
Scenario: Generate PDF with images
GIVEN a document processed through OCR or Direct track WHEN images are detected and extracted THEN the generated PDF MUST include all images at their original positions AND images MUST maintain their aspect ratios AND images MUST be saved to an imgs/ subdirectory
Scenario: Generate PDF with tables
GIVEN a document containing tables WHEN tables are detected and extracted THEN the generated PDF MUST render tables with proper structure AND tables MUST use their own bbox coordinates for positioning AND tables MUST NOT depend on fake image references
Scenario: Generate PDF with styled text
GIVEN a document processed through Direct track with StyleInfo WHEN text elements have style information THEN the generated PDF MUST apply font families (with mapping) AND the PDF MUST apply font sizes AND the PDF MUST apply text colors AND the PDF MUST apply bold/italic formatting
Requirement: Track-Specific Rendering
The system MUST provide different rendering approaches based on the processing track.
Scenario: Direct track rendering
GIVEN a document processed through Direct extraction WHEN generating a PDF THEN the system MUST use rich formatting preservation AND maintain precise positioning from the original AND apply all available StyleInfo
Scenario: OCR track rendering
GIVEN a document processed through OCR WHEN generating a PDF THEN the system MUST use simplified rendering AND apply best-effort positioning based on bbox AND use estimated font sizes
Requirement: Image Path Resolution
The system MUST correctly resolve image paths with fallback logic.
Scenario: Resolve saved image paths
GIVEN an element with image content WHEN looking for the image path THEN the system MUST check content["saved_path"] first AND fallback to content["path"] if not found AND fallback to content["image_path"] if not found AND finally check metadata["path"]
MODIFIED Requirements
Requirement: PDF Generation Pipeline
The PDF generation pipeline MUST be enhanced to support layout preservation.
Scenario: Enhanced PDF generation
GIVEN a UnifiedDocument from either track WHEN generating a PDF THEN the system MUST detect the processing track AND route to the appropriate rendering method AND preserve as much layout information as available
Requirement: Image Handling in PP-Structure
The PP-Structure enhanced module MUST actually save extracted images.
Scenario: Save PP-Structure images
GIVEN PP-Structure extracts an image with img_path WHEN processing the image element THEN the _save_image method MUST save the image to disk AND return a relative path for reference AND handle both file paths and numpy arrays
Requirement: Table Rendering Logic
The table rendering MUST use direct bbox instead of image lookup.
Scenario: Render table with direct bbox
GIVEN a table element with bbox coordinates WHEN rendering the table in PDF THEN the system MUST use the element's own bbox AND NOT look for non-existent table image files AND position the table accurately based on coordinates