refine: add OCR track line break support and spacing_after handling
Complete Phase 3 text rendering refinements for both tracks: **OCR Track Line Break Support (Task 5.1.4)** - Modified draw_text_region to split text on newlines - Calculate line height as font_size * 1.2 (same as Direct track) - Render each line with proper vertical spacing - Apply per-line font scaling when text exceeds bbox width - Lines 1191-1218 in pdf_generator_service.py **spacing_after Handling (Task 5.2.4)** - Extract spacing_after from element metadata - Add explanatory comments about spacing_after usage - Include spacing_after in debug logs for visibility - Note: In Direct track with fixed bbox, spacing_after is already reflected in element positions; recorded for structural analysis **Technical Details** - OCR track now has feature parity with Direct track for line breaks - Both tracks use identical line_height calculation (1.2x font size) - spacing_before applied via Y position adjustment - spacing_after recorded but not actively applied (bbox-based layout) **Modified Files** - backend/app/services/pdf_generator_service.py - Lines 1191-1218: OCR track line break handling - Lines 1567-1572: spacing_after comments and extraction - Lines 1641-1643: Enhanced debug logging - openspec/changes/pdf-layout-restoration/tasks.md - Added 5.1.4 and 5.2.4 completion markers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -1188,18 +1188,34 @@ class PDFGeneratorService:
|
|||||||
font_name = self.font_name if self.font_registered else 'Helvetica'
|
font_name = self.font_name if self.font_registered else 'Helvetica'
|
||||||
pdf_canvas.setFont(font_name, font_size)
|
pdf_canvas.setFont(font_name, font_size)
|
||||||
|
|
||||||
# Calculate text width to prevent overflow
|
# Handle line breaks (split text by newlines)
|
||||||
text_width = pdf_canvas.stringWidth(text, font_name, font_size)
|
lines = text.split('\n')
|
||||||
|
line_height = font_size * 1.2 # 120% of font size for line spacing
|
||||||
|
|
||||||
# If text is too wide for bbox, scale down font
|
# Draw each line
|
||||||
if text_width > bbox_width:
|
for i, line in enumerate(lines):
|
||||||
scale_factor = bbox_width / text_width
|
if not line.strip():
|
||||||
font_size = font_size * scale_factor * 0.95 # 95% to add small margin
|
continue # Skip empty lines
|
||||||
font_size = max(font_size, 3) # Minimum 3pt
|
|
||||||
pdf_canvas.setFont(font_name, font_size)
|
|
||||||
|
|
||||||
# Draw text at calculated position
|
line_y = pdf_y - (i * line_height)
|
||||||
pdf_canvas.drawString(pdf_x, pdf_y, text)
|
|
||||||
|
# Calculate text width to prevent overflow
|
||||||
|
text_width = pdf_canvas.stringWidth(line, font_name, font_size)
|
||||||
|
|
||||||
|
# If text is too wide for bbox, scale down font for this line
|
||||||
|
current_font_size = font_size
|
||||||
|
if text_width > bbox_width:
|
||||||
|
scale_factor = bbox_width / text_width
|
||||||
|
current_font_size = font_size * scale_factor * 0.95 # 95% to add small margin
|
||||||
|
current_font_size = max(current_font_size, 3) # Minimum 3pt
|
||||||
|
pdf_canvas.setFont(font_name, current_font_size)
|
||||||
|
|
||||||
|
# Draw text at calculated position
|
||||||
|
pdf_canvas.drawString(pdf_x, line_y, line)
|
||||||
|
|
||||||
|
# Reset font size for next line
|
||||||
|
if text_width > bbox_width:
|
||||||
|
pdf_canvas.setFont(font_name, font_size)
|
||||||
|
|
||||||
# Debug: Draw bounding box (optional)
|
# Debug: Draw bounding box (optional)
|
||||||
if settings.pdf_enable_bbox_debug:
|
if settings.pdf_enable_bbox_debug:
|
||||||
@@ -1549,6 +1565,9 @@ class PDFGeneratorService:
|
|||||||
first_line_indent = element.metadata.get('first_line_indent', indent) if element.metadata else indent
|
first_line_indent = element.metadata.get('first_line_indent', indent) if element.metadata else indent
|
||||||
|
|
||||||
# Get paragraph spacing
|
# Get paragraph spacing
|
||||||
|
# spacing_before: Applied by adjusting starting Y position (pdf_y)
|
||||||
|
# spacing_after: Recorded for debugging; in Direct track with fixed bbox,
|
||||||
|
# actual spacing is already reflected in element positions
|
||||||
paragraph_spacing_before = element.metadata.get('spacing_before', 0) if element.metadata else 0
|
paragraph_spacing_before = element.metadata.get('spacing_before', 0) if element.metadata else 0
|
||||||
paragraph_spacing_after = element.metadata.get('spacing_after', 0) if element.metadata else 0
|
paragraph_spacing_after = element.metadata.get('spacing_after', 0) if element.metadata else 0
|
||||||
|
|
||||||
@@ -1623,7 +1642,8 @@ class PDFGeneratorService:
|
|||||||
pdf_canvas.setFont(font_name, font_size)
|
pdf_canvas.setFont(font_name, font_size)
|
||||||
|
|
||||||
logger.debug(f"Drew text element: {text_content[:30]}... "
|
logger.debug(f"Drew text element: {text_content[:30]}... "
|
||||||
f"({len(lines)} lines, align={alignment}, indent={indent})")
|
f"({len(lines)} lines, align={alignment}, indent={indent}, "
|
||||||
|
f"spacing_before={paragraph_spacing_before}, spacing_after={paragraph_spacing_after})")
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Failed to draw text element {element.element_id}: {e}")
|
logger.error(f"Failed to draw text element {element.element_id}: {e}")
|
||||||
|
|||||||
@@ -81,10 +81,12 @@
|
|||||||
- [x] 5.1.1 Split text content by newlines (text.split('\n'))
|
- [x] 5.1.1 Split text content by newlines (text.split('\n'))
|
||||||
- [x] 5.1.2 Calculate line height from font size (font_size * 1.2)
|
- [x] 5.1.2 Calculate line height from font size (font_size * 1.2)
|
||||||
- [x] 5.1.3 Render each line with proper spacing (line_y = pdf_y - i * line_height)
|
- [x] 5.1.3 Render each line with proper spacing (line_y = pdf_y - i * line_height)
|
||||||
|
- [x] 5.1.4 Add OCR track support in draw_text_region (lines 1191-1218)
|
||||||
- [x] 5.2 Add paragraph handling
|
- [x] 5.2 Add paragraph handling
|
||||||
- [x] 5.2.1 Detect paragraph boundaries (via element.type PARAGRAPH)
|
- [x] 5.2.1 Detect paragraph boundaries (via element.type PARAGRAPH)
|
||||||
- [x] 5.2.2 Apply paragraph spacing (spacing_before/spacing_after from metadata)
|
- [x] 5.2.2 Apply paragraph spacing (spacing_before/spacing_after from metadata)
|
||||||
- [x] 5.2.3 Handle indentation (indent/first_line_indent from metadata)
|
- [x] 5.2.3 Handle indentation (indent/first_line_indent from metadata)
|
||||||
|
- [x] 5.2.4 Record spacing_after for debugging (included in logs)
|
||||||
- [x] 5.3 Implement text alignment
|
- [x] 5.3 Implement text alignment
|
||||||
- [x] 5.3.1 Support left/right/center/justify (from StyleInfo.alignment)
|
- [x] 5.3.1 Support left/right/center/justify (from StyleInfo.alignment)
|
||||||
- [x] 5.3.2 Calculate positioning based on alignment (line_x calculation)
|
- [x] 5.3.2 Calculate positioning based on alignment (line_x calculation)
|
||||||
|
|||||||
Reference in New Issue
Block a user