feat: add OCR track alignment support and spacing_after analysis
Complete text alignment parity between OCR and Direct tracks: **OCR Track Alignment Support (Task 5.3.5)** - Extract alignment from region style (StyleInfo or dict) - Support left/right/center/justify alignment in draw_text_region - Calculate line_x position based on alignment setting: - Left: line_x = pdf_x (default) - Center: line_x = pdf_x + (bbox_width - text_width) / 2 - Right: line_x = pdf_x + bbox_width - text_width - Justify: word spacing distribution (except last line) - Lines 1179-1247 in pdf_generator_service.py - OCR track now has feature parity with Direct track for alignment **Enhanced spacing_after Handling (Task 5.2.4-5.2.5)** - Calculate actual text height: len(lines) * line_height - Compute bbox_bottom_margin to show implicit spacing - Add detailed logging with actual_height and bbox_bottom_margin - Document that spacing_after is inherent in bbox-based layout - If text is shorter than bbox, remaining space acts as spacing - Lines 1680-1689 in pdf_generator_service.py **Technical Details** - Both tracks now support identical alignment modes - spacing_after is implicitly present in element positioning - bbox_bottom_margin = bbox_height - actual_text_height - spacing_before - This shows how much space remains below the text (implicit spacing_after) **Modified Files** - backend/app/services/pdf_generator_service.py - Lines 1179-1185: Alignment extraction for OCR track - Lines 1222-1247: OCR track alignment calculation and rendering - Lines 1680-1689: spacing_after analysis with bbox_bottom_margin - openspec/changes/pdf-layout-restoration/tasks.md - Added 5.2.5: bbox_bottom_margin calculation - Added 5.3.5: OCR track alignment support 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -81,17 +81,19 @@
|
||||
- [x] 5.1.1 Split text content by newlines (text.split('\n'))
|
||||
- [x] 5.1.2 Calculate line height from font size (font_size * 1.2)
|
||||
- [x] 5.1.3 Render each line with proper spacing (line_y = pdf_y - i * line_height)
|
||||
- [x] 5.1.4 Add OCR track support in draw_text_region (lines 1191-1218)
|
||||
- [x] 5.1.4 Add OCR track support in draw_text_region (lines 1199-1254)
|
||||
- [x] 5.2 Add paragraph handling
|
||||
- [x] 5.2.1 Detect paragraph boundaries (via element.type PARAGRAPH)
|
||||
- [x] 5.2.2 Apply paragraph spacing (spacing_before/spacing_after from metadata)
|
||||
- [x] 5.2.3 Handle indentation (indent/first_line_indent from metadata)
|
||||
- [x] 5.2.4 Record spacing_after for debugging (included in logs)
|
||||
- [x] 5.2.4 Record spacing_after with actual text height analysis (lines 1680-1689)
|
||||
- [x] 5.2.5 Calculate bbox_bottom_margin to show implicit spacing
|
||||
- [x] 5.3 Implement text alignment
|
||||
- [x] 5.3.1 Support left/right/center/justify (from StyleInfo.alignment)
|
||||
- [x] 5.3.2 Calculate positioning based on alignment (line_x calculation)
|
||||
- [x] 5.3.3 Apply to each text block (per-line alignment in _draw_text_element_direct)
|
||||
- [x] 5.3.4 Justify alignment with word spacing distribution
|
||||
- [x] 5.3.5 Add OCR track alignment support (extract from style, lines 1179-1247)
|
||||
|
||||
### 6. List Formatting
|
||||
- [ ] 6.1 Detect list elements
|
||||
|
||||
Reference in New Issue
Block a user