Backup commit before executing remove-unused-code proposal.
This includes all pending changes and new features.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Force Office documents (PPTX, DOCX, XLSX) to use Direct track after
LibreOffice conversion, since converted PDFs always have extractable text
- Fix PDF generator to not exclude text in image regions for Direct track,
allowing text to render on top of background images (critical for PPT)
- Increase file_type column from VARCHAR(50) to VARCHAR(100) to support
long MIME types like PPTX
- Remove reference to non-existent total_images metadata attribute
This significantly improves processing time for Office documents
(from ~170s OCR to ~10s Direct) while preserving text quality.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Removed all test files and directories
- Deleted outdated documentation (will be rewritten)
- Cleaned up temporary files, logs, and uploads
- Archived 5 completed OpenSpec proposals
- Created new dual-track-document-processing proposal with complete OpenSpec structure
- Dual-track architecture: OCR track (PaddleOCR) + Direct track (PyMuPDF)
- UnifiedDocument model for consistent output
- Support for structure-preserving translation
- Updated .gitignore to prevent future test/temp files
This is a major cleanup preparing for the complete refactoring of the document processing pipeline.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Remove all V1 architecture components and promote V2 to primary:
- Delete all paddle_ocr_* table models (export, ocr, translation, user)
- Delete legacy routers (auth, export, ocr, translation)
- Delete legacy schemas and services
- Promote user_v2.py to user.py as primary user model
- Update all imports and dependencies to use V2 models only
- Update main.py version to 2.0.0
Database changes:
- Fix SQLAlchemy reserved word: rename audit_log.metadata to extra_data
- Add migration to drop all paddle_ocr_* tables
- Update alembic env to only import V2 models
Frontend fixes:
- Fix Select component exports in TaskHistoryPage.tsx
- Update to use simplified Select API with options prop
- Fix AxiosInstance TypeScript import syntax
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>