fix: complete layout analysis features for DirectExtractionEngine

Implements missing layout analysis capabilities: - Add footer detection based on page position (bottom 10%) - Build hierarchical section structure from font sizes - Create nested list structure from indentation levels All elements now have proper metadata for: - section_level, parent_section, child_sections (headers) - list_level, parent_item, children (list items) - is_page_header, is_page_footer flags Updates tasks.md to reflect accurate completion status. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 08:15:11 +08:00
parent a3a6fbe58b
commit 5bcf3dfd42
2 changed files with 134 additions and 0 deletions
--- a/openspec/changes/dual-track-document-processing/tasks.md
+++ b/openspec/changes/dual-track-document-processing/tasks.md
@@ -11,6 +11,7 @@
  - [x] 1.2.2 Add DocumentElement model
  - [x] 1.2.3 Add DocumentMetadata model
  - [x] 1.2.4 Create converters for both OCR and direct extraction outputs
+    - Note: OCR converter complete; DirectExtractionEngine returns UnifiedDocument directly
 - [x] 1.3 Create DocumentTypeDetector service
  - [x] 1.3.1 Implement file type detection using python-magic
  - [x] 1.3.2 Add PDF editability checking logic