fix: properly implement list formatting with sequential numbering and grouping
Fix critical issues in Task 6 list formatting implementation: **Issue 1: LIST_ITEM Elements Not Rendered** - Problem: LIST_ITEM type not included in is_text property - Fix: Separate list_elements from text_elements (lines 626, 636-637) - Impact: List items were completely ignored in rendering **Issue 2: Missing Sequential Numbering** - Problem: Each list item independently parsed its own number - Fix: Implement _draw_list_elements_direct method (lines 1523-1610) - Groups list items by proximity (max_gap=30pt) and level - Maintains list_counter across items for sequential numbering - Starts from original number in first item **Issue 3: Unreliable List Type Detection** - Problem: Regex-based detection per item, not per list - Fix: Detect type from first item in group, apply to all items - Store computed marker in metadata (_list_marker, _list_type) - Ensures consistency across entire list **Issue 4: Insufficient List Spacing Control** - Problem: No grouping logic, relied solely on bbox positions - Fix: Proximity-based grouping with 30pt max gap threshold - Groups consecutive items into lists - Separates lists when gap exceeds threshold or level changes **Technical Implementation** New method: _draw_list_elements_direct (lines 1523-1610) - Sort items by position (y0, x0) - Group by proximity and level - Detect list type from first item - Assign sequential markers - Store in metadata for _draw_text_element_direct Updated: _draw_text_element_direct (lines 1662-1677) - Use pre-computed _list_marker from metadata - Simplified marker removal (just clean original markers) - No longer needs to maintain counter per-item Updated: _generate_direct_track_pdf (lines 622-663) - Separate list_elements collection - Call _draw_list_elements_direct before text rendering - Updated logging to show list item count **Modified Files** - backend/app/services/pdf_generator_service.py - Lines 626, 636-637: Separate list_elements - Lines 644-646: Updated logging - Lines 658-659: Add list rendering layer - Lines 1523-1610: New _draw_list_elements_direct method - Lines 1662-1677: Simplified list detection in _draw_text_element_direct - openspec/changes/pdf-layout-restoration/tasks.md - Updated Task 6.1 subtasks with accurate implementation details - Updated Task 6.2 subtasks with grouping and numbering logic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -99,13 +99,16 @@
|
||||
|
||||
### 6. List Formatting (Direct track only)
|
||||
- [x] 6.1 Detect list elements from Direct track
|
||||
- [x] 6.1.1 Identify LIST_ITEM elements (element.type == ElementType.LIST_ITEM)
|
||||
- [x] 6.1.2 Determine list type via regex (ordered: ^\d+[\.\)], unordered: ^[•·▪▫◦‣⁃])
|
||||
- [x] 6.1.3 Extract indent level from metadata (list_level, lines 1567-1598)
|
||||
- [x] 6.1.1 Identify LIST_ITEM elements (separate from text_elements, lines 636-637)
|
||||
- [x] 6.1.2 Group list items by proximity and level (_draw_list_elements_direct, lines 1543-1570)
|
||||
- [x] 6.1.3 Determine list type via regex on first item (ordered/unordered, lines 1582-1590)
|
||||
- [x] 6.1.4 Extract indent level from metadata (list_level)
|
||||
- [x] 6.2 Render lists with proper formatting
|
||||
- [x] 6.2.1 Add bullets/numbers as list markers (lines 1571-1588, prepended to first line)
|
||||
- [x] 6.2.2 Apply indentation (20pt per level, lines 1594-1598)
|
||||
- [x] 6.2.3 Maintain list spacing (inherent in bbox-based layout, spacing_before/after)
|
||||
- [x] 6.2.1 Sequential numbering across list items (list_counter, lines 1593-1602)
|
||||
- [x] 6.2.2 Add bullets/numbers as list markers (stored in _list_marker metadata, lines 1603-1607)
|
||||
- [x] 6.2.3 Apply indentation (20pt per level, lines 1683-1687)
|
||||
- [x] 6.2.4 Remove original markers from text content (lines 1671-1677)
|
||||
- [x] 6.2.5 Maintain list spacing via proximity-based grouping (max_gap=30pt, lines 1551-1563)
|
||||
|
||||
### 7. Span-Level Rendering (Advanced)
|
||||
- [ ] 7.1 Extract span information from Direct track
|
||||
|
||||
Reference in New Issue
Block a user