feat: implement Task 7 span-level rendering for inline styling

Added support for preserving and rendering inline style variations
within text elements (e.g., bold/italic/color changes mid-line).

Span Extraction (direct_extraction_engine.py):
1. Parse PyMuPDF span data with font, size, flags, color per span
2. Create DocumentElement children for each span with StyleInfo
3. Store spans in element.children for downstream rendering
4. Extract span-specific bbox from PyMuPDF (lines 434-453)

Span Rendering (pdf_generator_service.py):
1. Implement _draw_text_with_spans() method (lines 1685-1734)
   - Iterate through span children
   - Apply per-span styling via _apply_text_style
   - Track X position and calculate widths
   - Return total rendered width
2. Integrate in _draw_text_element_direct() (lines 1822-1823, 1905-1914)
   - Check for element.children (has_spans flag)
   - Use span rendering for first line
   - Fall back to normal rendering for list items
3. Add span count to debug logging

Features:
- Inline font changes (Arial → Times → Courier)
- Inline size changes (12pt → 14pt → 10pt)
- Inline style changes (normal → bold → italic)
- Inline color changes (black → red → blue)

Limitations (future work):
- Currently renders all spans on first line only
- Multi-line span support requires line breaking logic
- List items use single-style rendering (compatibility)

Direct track only (OCR track has no span information).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-24 11:44:05 +08:00
parent b1de7616e4
commit 75c194fe2a
3 changed files with 113 additions and 13 deletions

View File

@@ -122,15 +122,22 @@
- [x] Pass y_offset to _draw_text_element_direct (line 1668, 1690, 1716)
- [x] 6.2.7 Maintain list grouping via proximity (max_gap=30pt, lines 1597-1607)
### 7. Span-Level Rendering (Advanced)
- [ ] 7.1 Extract span information from Direct track
- [ ] 7.1.1 Parse children elements for spans
- [ ] 7.1.2 Get per-span styling
- [ ] 7.1.3 Track position within line
- [ ] 7.2 Render mixed-style lines
- [ ] 7.2.1 Switch styles mid-line
- [ ] 7.2.2 Handle inline formatting
- [ ] 7.2.3 Preserve exact positioning
### 7. Span-Level Rendering (Advanced, Direct track only)
- [x] 7.1 Extract span information from Direct track
- [x] 7.1.1 Parse PyMuPDF span data in _process_text_block (direct_extraction_engine.py:418-453)
- [x] 7.1.2 Create span DocumentElements with per-span StyleInfo (lines 434-453)
- [x] 7.1.3 Store spans in element.children for inline styling (line 476)
- [x] 7.1.4 Extract span bbox, font, size, flags, color from PyMuPDF (lines 435-450)
- [x] 7.2 Render mixed-style lines
- [x] 7.2.1 Implement _draw_text_with_spans method (pdf_generator_service.py:1685-1734)
- [x] 7.2.2 Switch styles mid-line by iterating spans (lines 1709-1732)
- [x] 7.2.3 Apply span-specific style via _apply_text_style (lines 1715-1716)
- [x] 7.2.4 Track X position and calculate span widths (lines 1706, 1730-1732)
- [x] 7.2.5 Integrate span rendering in _draw_text_element_direct (lines 1822-1823, 1905-1914)
- [x] 7.2.6 Handle inline formatting with per-span fonts, sizes, colors, bold/italic
- [ ] 7.3 Future enhancements
- [ ] 7.3.1 Multi-line span support with line breaking logic
- [ ] 7.3.2 Preserve exact span positioning from PyMuPDF bbox
## Phase 4: Testing and Optimization (P2 - Week 3)