feat: implement Phase 3 list formatting for Direct track
Add comprehensive list rendering with automatic detection and formatting: **Task 6.1: List Element Detection** - Detect LIST_ITEM elements by type (element.type == ElementType.LIST_ITEM) - Extract list_level from element metadata (lines 1566-1567) - Determine list type via regex pattern matching: - Ordered lists: ^\d+[\.\)]\s (e.g., "1. ", "2) ") - Unordered lists: ^[•·▪▫◦‣⁃]\s (various bullet symbols) - Parse and extract list markers from text content (lines 1571-1588) **Task 6.2: List Rendering** - Add list markers to first line of each item: - Ordered: Preserve original numbering (e.g., "1. ") - Unordered: Standardize to bullet "• " - Remove original markers from text content - Apply list indentation: 20pt per nesting level (lines 1594-1598) - Combine list indent with existing paragraph indent - List spacing: Inherited from bbox-based layout (spacing_before/after) **Implementation Details** - Lines 1565-1598: List detection and indentation logic - Lines 1629-1632: Prepend list marker to first line (rendered_line) - Lines 1635-1676: Update all text width calculations to use rendered_line - Lines 1688-1692: Enhanced logging with list type and level **Technical Notes** - Direct track only (OCR track has no list metadata) - Integrates with existing alignment and indentation system - Preserves line breaks and multi-line list items - Works with all text alignment modes (left/center/right/justify) **Modified Files** - backend/app/services/pdf_generator_service.py - Added import re for regex pattern matching - Lines 1565-1598: List detection and indentation - Lines 1629-1676: List marker rendering - Lines 1688-1692: Enhanced debug logging - openspec/changes/pdf-layout-restoration/tasks.md - Marked Task 6.1 (all subtasks) as completed - Marked Task 6.2 (all subtasks) as completed - Added implementation line references 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -97,15 +97,15 @@
|
||||
- [x] 5.3.4 Justify alignment with word spacing distribution
|
||||
- [x] 5.3.5 OCR track: left-aligned only (no StyleInfo available)
|
||||
|
||||
### 6. List Formatting
|
||||
- [ ] 6.1 Detect list elements
|
||||
- [ ] 6.1.1 Identify list items from metadata
|
||||
- [ ] 6.1.2 Determine list type (ordered/unordered)
|
||||
- [ ] 6.1.3 Extract indent level
|
||||
- [ ] 6.2 Render lists with proper formatting
|
||||
- [ ] 6.2.1 Add bullets/numbers
|
||||
- [ ] 6.2.2 Apply indentation
|
||||
- [ ] 6.2.3 Maintain list spacing
|
||||
### 6. List Formatting (Direct track only)
|
||||
- [x] 6.1 Detect list elements from Direct track
|
||||
- [x] 6.1.1 Identify LIST_ITEM elements (element.type == ElementType.LIST_ITEM)
|
||||
- [x] 6.1.2 Determine list type via regex (ordered: ^\d+[\.\)], unordered: ^[•·▪▫◦‣⁃])
|
||||
- [x] 6.1.3 Extract indent level from metadata (list_level, lines 1567-1598)
|
||||
- [x] 6.2 Render lists with proper formatting
|
||||
- [x] 6.2.1 Add bullets/numbers as list markers (lines 1571-1588, prepended to first line)
|
||||
- [x] 6.2.2 Apply indentation (20pt per level, lines 1594-1598)
|
||||
- [x] 6.2.3 Maintain list spacing (inherent in bbox-based layout, spacing_before/after)
|
||||
|
||||
### 7. Span-Level Rendering (Advanced)
|
||||
- [ ] 7.1 Extract span information from Direct track
|
||||
|
||||
Reference in New Issue
Block a user