CRITICAL BUG FIXES (Based on expert analysis):
Bug A - Y-axis Starting Position Error:
- Previous code used bbox.y1 (bottom) as starting point for multi-line text
- Caused first line to render at last line position, text overflowing downward
- FIX: Span-based rendering now uses `page_height - span.bbox.y1 + (font_size * 0.2)`
to approximate baseline position for each span individually
- FIX: Block-level fallback starts from bbox.y0 (top), draws lines downward:
`pdf_y_top = page_height - bbox.y0`, then `line_y = pdf_y_top - ((i + 1) * line_height)`
Bug B - Spans Compressed to First Line:
- Previous code forced all spans to render only on first line (if i == 0 check)
- Destroyed multi-line and multi-column layouts by compressing paragraphs
- FIX: Prioritize span-based rendering - each span uses its own precise bbox
- FIX: Removed line iteration for spans - they already have correct coordinates
- FIX: Return immediately after drawing spans to prevent block text overlap
Implementation Changes:
1. Span-Based Rendering (Priority Path):
- Iterate through element.children (spans) with precise bbox from PyMuPDF
- Each span positioned independently using its own coordinates
- Apply per-span StyleInfo (font_name, font_size, font_weight, font_style)
- Transform coordinates: span_pdf_y = page_height - s_bbox.y1 + (font_size * 0.2)
- Used for 84% of text elements (16/19 elements in test)
2. Block-Level Fallback (Corrected Y-Axis):
- Used when no spans available (filtered/modified text)
- Start from TOP: pdf_y_top = page_height - bbox.y0
- Draw lines downward: line_y = pdf_y_top - ((i + 1) * line_height)
- Maintains proper line spacing and paragraph flow
3. Testing:
- Added comprehensive E2E test suite (test_pdf_layout_restoration.py)
- Quick visual verification test (quick_visual_test.py)
- Test results documented in TEST_RESULTS_SPAN_FIX.md
Test Results:
✅ PDF generation: 14,172 bytes, 3 pages with content
✅ Span rendering: 84% of elements (16/19) using precise bbox
✅ Font sizes: Correct 10pt (not 35pt from bbox_height)
✅ Line count: 152 lines (proper spacing, no compression)
✅ Reading order: Correct left-right, top-bottom pattern
✅ First line: "Technical Data Sheet" (verified correct)
Files Changed:
- backend/app/services/pdf_generator_service.py: Complete rewrite of
_draw_text_element_direct() method (lines 1796-2024)
- backend/tests/e2e/test_pdf_layout_restoration.py: New E2E test suite
- backend/tests/e2e/TEST_RESULTS_SPAN_FIX.md: Comprehensive test results
References:
- Expert analysis identified Y-axis and span compression bugs
- Solution prioritizes PyMuPDF's precise span-level bbox data
- Maintains backward compatibility with block-level fallback
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>