# Implementation Tasks: PDF Layout Restoration ## Phase 1: Critical Fixes (P0 - Immediate) ### 1. Fix Image Handling - [x] 1.1 Implement `_save_image()` in pp_structure_enhanced.py - [x] 1.1.1 Create imgs subdirectory in result_dir - [x] 1.1.2 Handle both file path and numpy array inputs - [x] 1.1.3 Save with element_id as filename - [x] 1.1.4 Return relative path for reference - [x] 1.1.5 Add error handling and logging - [x] 1.2 Fix path resolution in pdf_generator_service.py - [x] 1.2.1 Create `_get_image_path()` helper with fallback logic - [x] 1.2.2 Check saved_path, path, image_path keys - [x] 1.2.3 Check metadata for path - [x] 1.2.4 Update convert_unified_document_to_ocr_data to use helper - [ ] 1.3 Test image rendering - [ ] 1.3.1 Test with OCR track document - [ ] 1.3.2 Test with Direct track document - [ ] 1.3.3 Verify images appear in PDF output ### 2. Fix Table Rendering - [x] 2.1 Remove dependency on fake image references - [x] 2.1.1 Stop creating fake table_*.png references (changed to None) - [x] 2.1.2 Remove image lookup fallback in draw_table_region - [x] 2.2 Use direct bbox from table element - [x] 2.2.1 Get bbox from table_element.get("bbox") - [x] 2.2.2 Fallback to bbox_polygon if needed - [x] 2.2.3 Implement _polygon_to_bbox converter (inline conversion implemented) - [x] 2.3 Fix table HTML rendering - [x] 2.3.1 Parse HTML content from table element - [x] 2.3.2 Position table using normalized bbox - [x] 2.3.3 Render with proper dimensions - [ ] 2.4 Test table rendering - [ ] 2.4.1 Test simple tables - [ ] 2.4.2 Test complex multi-column tables - [ ] 2.4.3 Test with both tracks ## Phase 2: Basic Style Preservation (P1 - Week 1) ### 3. Implement Style Application System - [ ] 3.1 Create font mapping system - [ ] 3.1.1 Define FONT_MAPPING dictionary - [ ] 3.1.2 Map common fonts to PDF standard fonts - [ ] 3.1.3 Add fallback to Helvetica for unknown fonts - [ ] 3.2 Implement _apply_text_style() method - [ ] 3.2.1 Extract font family from StyleInfo - [ ] 3.2.2 Handle bold/italic flags - [ ] 3.2.3 Apply font size - [ ] 3.2.4 Apply text color - [ ] 3.2.5 Handle errors gracefully - [ ] 3.3 Create color parsing utilities - [ ] 3.3.1 Parse hex colors (#RRGGBB) - [ ] 3.3.2 Parse RGB tuples - [ ] 3.3.3 Convert to PDF color space ### 4. Track-Specific Rendering - [ ] 4.1 Add track detection in generate_from_unified_document - [ ] 4.1.1 Check unified_doc.metadata.processing_track - [ ] 4.1.2 Route to appropriate rendering method - [ ] 4.2 Implement _generate_direct_track_pdf - [ ] 4.2.1 Process each page with style preservation - [ ] 4.2.2 Apply StyleInfo to text elements - [ ] 4.2.3 Use precise positioning - [ ] 4.2.4 Preserve line breaks - [ ] 4.3 Implement _generate_ocr_track_pdf - [ ] 4.3.1 Use simplified rendering - [ ] 4.3.2 Best-effort positioning - [ ] 4.3.3 Estimated font sizes - [ ] 4.4 Test track-specific rendering - [ ] 4.4.1 Compare Direct track with original - [ ] 4.4.2 Verify OCR track maintains quality ## Phase 3: Advanced Layout (P2 - Week 2) ### 5. Enhanced Text Rendering - [ ] 5.1 Implement line-by-line rendering - [ ] 5.1.1 Split text content by newlines - [ ] 5.1.2 Calculate line height from font size - [ ] 5.1.3 Render each line with proper spacing - [ ] 5.2 Add paragraph handling - [ ] 5.2.1 Detect paragraph boundaries - [ ] 5.2.2 Apply paragraph spacing - [ ] 5.2.3 Handle indentation - [ ] 5.3 Implement text alignment - [ ] 5.3.1 Support left/right/center/justify - [ ] 5.3.2 Calculate positioning based on alignment - [ ] 5.3.3 Apply to each text block ### 6. List Formatting - [ ] 6.1 Detect list elements - [ ] 6.1.1 Identify list items from metadata - [ ] 6.1.2 Determine list type (ordered/unordered) - [ ] 6.1.3 Extract indent level - [ ] 6.2 Render lists with proper formatting - [ ] 6.2.1 Add bullets/numbers - [ ] 6.2.2 Apply indentation - [ ] 6.2.3 Maintain list spacing ### 7. Span-Level Rendering (Advanced) - [ ] 7.1 Extract span information from Direct track - [ ] 7.1.1 Parse children elements for spans - [ ] 7.1.2 Get per-span styling - [ ] 7.1.3 Track position within line - [ ] 7.2 Render mixed-style lines - [ ] 7.2.1 Switch styles mid-line - [ ] 7.2.2 Handle inline formatting - [ ] 7.2.3 Preserve exact positioning ## Phase 4: Testing and Optimization (P2 - Week 3) ### 8. Comprehensive Testing - [ ] 8.1 Create test suite for layout preservation - [ ] 8.1.1 Unit tests for each component - [ ] 8.1.2 Integration tests for full pipeline - [ ] 8.1.3 Visual regression tests - [ ] 8.2 Test with various document types - [ ] 8.2.1 Scientific papers (complex layout) - [ ] 8.2.2 Business documents (tables/charts) - [ ] 8.2.3 Books (chapters/paragraphs) - [ ] 8.2.4 Forms (precise positioning) - [ ] 8.3 Performance testing - [ ] 8.3.1 Measure generation time - [ ] 8.3.2 Profile memory usage - [ ] 8.3.3 Identify bottlenecks ### 9. Performance Optimization - [ ] 9.1 Implement caching - [ ] 9.1.1 Cache font metrics - [ ] 9.1.2 Cache parsed styles - [ ] 9.1.3 Reuse computed layouts - [ ] 9.2 Optimize image handling - [ ] 9.2.1 Lazy load images - [ ] 9.2.2 Compress when appropriate - [ ] 9.2.3 Stream large images - [ ] 9.3 Batch operations - [ ] 9.3.1 Group similar rendering ops - [ ] 9.3.2 Minimize context switches - [ ] 9.3.3 Use efficient data structures ### 10. Documentation and Deployment - [ ] 10.1 Update API documentation - [ ] 10.1.1 Document new rendering capabilities - [ ] 10.1.2 Add examples of improved output - [ ] 10.1.3 Note performance characteristics - [ ] 10.2 Create migration guide - [ ] 10.2.1 Explain improvements - [ ] 10.2.2 Note any breaking changes - [ ] 10.2.3 Provide rollback instructions - [ ] 10.3 Deployment preparation - [ ] 10.3.1 Feature flag setup - [ ] 10.3.2 Monitoring metrics - [ ] 10.3.3 Rollback plan ## Success Criteria ### Must Have (Phase 1) - [x] Images appear in generated PDFs - [x] Tables render with correct layout - [x] No regression in existing functionality ### Should Have (Phase 2) - [ ] Text styling preserved in Direct track - [ ] Font sizes and colors applied - [ ] Line breaks maintained ### Nice to Have (Phase 3-4) - [ ] Paragraph formatting - [ ] List rendering - [ ] Span-level styling - [ ] <10% performance overhead ## Timeline - **Week 0**: Phase 1 - Critical fixes (images, tables) - **Week 1**: Phase 2 - Basic style preservation - **Week 2**: Phase 3 - Advanced layout features - **Week 3**: Phase 4 - Testing and optimization - **Week 4**: Review, documentation, and deployment