OCR/tasks.md at ad879d48e56e45ab201f2783c053825e759a1fdf

egg ad879d48e5 feat: implement Phase 3 list formatting for Direct track

Add comprehensive list rendering with automatic detection and formatting:

**Task 6.1: List Element Detection**
- Detect LIST_ITEM elements by type (element.type == ElementType.LIST_ITEM)
- Extract list_level from element metadata (lines 1566-1567)
- Determine list type via regex pattern matching:
  - Ordered lists: ^\d+[\.\)]\s (e.g., "1. ", "2) ")
  - Unordered lists: ^[•·▪▫◦‣⁃]\s (various bullet symbols)
- Parse and extract list markers from text content (lines 1571-1588)

**Task 6.2: List Rendering**
- Add list markers to first line of each item:
  - Ordered: Preserve original numbering (e.g., "1. ")
  - Unordered: Standardize to bullet "• "
- Remove original markers from text content
- Apply list indentation: 20pt per nesting level (lines 1594-1598)
- Combine list indent with existing paragraph indent
- List spacing: Inherited from bbox-based layout (spacing_before/after)

**Implementation Details**
- Lines 1565-1598: List detection and indentation logic
- Lines 1629-1632: Prepend list marker to first line (rendered_line)
- Lines 1635-1676: Update all text width calculations to use rendered_line
- Lines 1688-1692: Enhanced logging with list type and level

**Technical Notes**
- Direct track only (OCR track has no list metadata)
- Integrates with existing alignment and indentation system
- Preserves line breaks and multi-line list items
- Works with all text alignment modes (left/center/right/justify)

**Modified Files**
- backend/app/services/pdf_generator_service.py
  - Added import re for regex pattern matching
  - Lines 1565-1598: List detection and indentation
  - Lines 1629-1676: List marker rendering
  - Lines 1688-1692: Enhanced debug logging
- openspec/changes/pdf-layout-restoration/tasks.md
  - Marked Task 6.1 (all subtasks) as completed
  - Marked Task 6.2 (all subtasks) as completed
  - Added implementation line references

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

8.4 KiB

Raw Blame History

Implementation Tasks: PDF Layout Restoration

Phase 1: Critical Fixes (P0 - Immediate)

1. Fix Image Handling

2. Fix Table Rendering

Phase 2: Basic Style Preservation (P1 - Week 1)

3. Implement Style Application System

4. Track-Specific Rendering

Phase 3: Advanced Layout (P2 - Week 2)

5. Enhanced Text Rendering

6. List Formatting (Direct track only)

7. Span-Level Rendering (Advanced)

Phase 4: Testing and Optimization (P2 - Week 3)

8. Comprehensive Testing

9. Performance Optimization

10. Documentation and Deployment

Success Criteria

Must Have (Phase 1)

Should Have (Phase 2)

Nice to Have (Phase 3-4)

Timeline

8.4 KiB Raw Blame History

Implementation Tasks: PDF Layout Restoration

Phase 1: Critical Fixes (P0 - Immediate)

1. Fix Image Handling

2. Fix Table Rendering

Phase 2: Basic Style Preservation (P1 - Week 1)

3. Implement Style Application System

4. Track-Specific Rendering

Phase 3: Advanced Layout (P2 - Week 2)

5. Enhanced Text Rendering

6. List Formatting (Direct track only)

7. Span-Level Rendering (Advanced)

Phase 4: Testing and Optimization (P2 - Week 3)

8. Comprehensive Testing

9. Performance Optimization

10. Documentation and Deployment

Success Criteria

Must Have (Phase 1)

Should Have (Phase 2)

Nice to Have (Phase 3-4)

Timeline

8.4 KiB

Raw Blame History