Commit Graph

2 Commits

Author SHA1 Message Date
egg
0608017a02 chore: update tasks.md with completed infrastructure work
Progress update:
- Core Infrastructure: 13/14 tasks completed
- Direct Extraction Track: 18/18 tasks completed
- Total progress: 30/147 tasks (20.4%)

Completed major components:
 UnifiedDocument model with all structures
 DocumentTypeDetector service
 DirectExtractionEngine with PyMuPDF
 Dependencies added to requirements.txt

Next priorities:
- Update OCR service for dual-track integration
- Enhance PP-StructureV3 usage
- Update PDF generator for UnifiedDocument
2025-11-18 20:37:30 +08:00
egg
cd3cbea49d chore: project cleanup and prepare for dual-track processing refactor
- Removed all test files and directories
- Deleted outdated documentation (will be rewritten)
- Cleaned up temporary files, logs, and uploads
- Archived 5 completed OpenSpec proposals
- Created new dual-track-document-processing proposal with complete OpenSpec structure
  - Dual-track architecture: OCR track (PaddleOCR) + Direct track (PyMuPDF)
  - UnifiedDocument model for consistent output
  - Support for structure-preserving translation
- Updated .gitignore to prevent future test/temp files

This is a major cleanup preparing for the complete refactoring of the document processing pipeline.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 20:02:31 +08:00