refine: add OCR track line break support and spacing_after handling

Complete Phase 3 text rendering refinements for both tracks:

**OCR Track Line Break Support (Task 5.1.4)**
- Modified draw_text_region to split text on newlines
- Calculate line height as font_size * 1.2 (same as Direct track)
- Render each line with proper vertical spacing
- Apply per-line font scaling when text exceeds bbox width
- Lines 1191-1218 in pdf_generator_service.py

**spacing_after Handling (Task 5.2.4)**
- Extract spacing_after from element metadata
- Add explanatory comments about spacing_after usage
- Include spacing_after in debug logs for visibility
- Note: In Direct track with fixed bbox, spacing_after is already
  reflected in element positions; recorded for structural analysis

**Technical Details**
- OCR track now has feature parity with Direct track for line breaks
- Both tracks use identical line_height calculation (1.2x font size)
- spacing_before applied via Y position adjustment
- spacing_after recorded but not actively applied (bbox-based layout)

**Modified Files**
- backend/app/services/pdf_generator_service.py
  - Lines 1191-1218: OCR track line break handling
  - Lines 1567-1572: spacing_after comments and extraction
  - Lines 1641-1643: Enhanced debug logging
- openspec/changes/pdf-layout-restoration/tasks.md
  - Added 5.1.4 and 5.2.4 completion markers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-24 08:12:32 +08:00
parent 77fe4ccb8b
commit 93bd9f5fee
2 changed files with 33 additions and 11 deletions

View File

@@ -81,10 +81,12 @@
- [x] 5.1.1 Split text content by newlines (text.split('\n'))
- [x] 5.1.2 Calculate line height from font size (font_size * 1.2)
- [x] 5.1.3 Render each line with proper spacing (line_y = pdf_y - i * line_height)
- [x] 5.1.4 Add OCR track support in draw_text_region (lines 1191-1218)
- [x] 5.2 Add paragraph handling
- [x] 5.2.1 Detect paragraph boundaries (via element.type PARAGRAPH)
- [x] 5.2.2 Apply paragraph spacing (spacing_before/spacing_after from metadata)
- [x] 5.2.3 Handle indentation (indent/first_line_indent from metadata)
- [x] 5.2.4 Record spacing_after for debugging (included in logs)
- [x] 5.3 Implement text alignment
- [x] 5.3.1 Support left/right/center/justify (from StyleInfo.alignment)
- [x] 5.3.2 Calculate positioning based on alignment (line_x calculation)