Files
OCR/backend/app/services
egg 1ec186f680 fix: properly implement list formatting with sequential numbering and grouping
Fix critical issues in Task 6 list formatting implementation:

**Issue 1: LIST_ITEM Elements Not Rendered**
- Problem: LIST_ITEM type not included in is_text property
- Fix: Separate list_elements from text_elements (lines 626, 636-637)
- Impact: List items were completely ignored in rendering

**Issue 2: Missing Sequential Numbering**
- Problem: Each list item independently parsed its own number
- Fix: Implement _draw_list_elements_direct method (lines 1523-1610)
- Groups list items by proximity (max_gap=30pt) and level
- Maintains list_counter across items for sequential numbering
- Starts from original number in first item

**Issue 3: Unreliable List Type Detection**
- Problem: Regex-based detection per item, not per list
- Fix: Detect type from first item in group, apply to all items
- Store computed marker in metadata (_list_marker, _list_type)
- Ensures consistency across entire list

**Issue 4: Insufficient List Spacing Control**
- Problem: No grouping logic, relied solely on bbox positions
- Fix: Proximity-based grouping with 30pt max gap threshold
- Groups consecutive items into lists
- Separates lists when gap exceeds threshold or level changes

**Technical Implementation**

New method: _draw_list_elements_direct (lines 1523-1610)
- Sort items by position (y0, x0)
- Group by proximity and level
- Detect list type from first item
- Assign sequential markers
- Store in metadata for _draw_text_element_direct

Updated: _draw_text_element_direct (lines 1662-1677)
- Use pre-computed _list_marker from metadata
- Simplified marker removal (just clean original markers)
- No longer needs to maintain counter per-item

Updated: _generate_direct_track_pdf (lines 622-663)
- Separate list_elements collection
- Call _draw_list_elements_direct before text rendering
- Updated logging to show list item count

**Modified Files**
- backend/app/services/pdf_generator_service.py
  - Lines 626, 636-637: Separate list_elements
  - Lines 644-646: Updated logging
  - Lines 658-659: Add list rendering layer
  - Lines 1523-1610: New _draw_list_elements_direct method
  - Lines 1662-1677: Simplified list detection in _draw_text_element_direct
- openspec/changes/pdf-layout-restoration/tasks.md
  - Updated Task 6.1 subtasks with accurate implementation details
  - Updated Task 6.2 subtasks with grouping and numbering logic

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 09:59:00 +08:00
..
2025-11-12 22:53:17 +08:00
2025-11-12 22:53:17 +08:00