Files
OCR/backend/app
egg 87dc97d951 fix: improve Office document processing with Direct track
- Force Office documents (PPTX, DOCX, XLSX) to use Direct track after
  LibreOffice conversion, since converted PDFs always have extractable text
- Fix PDF generator to not exclude text in image regions for Direct track,
  allowing text to render on top of background images (critical for PPT)
- Increase file_type column from VARCHAR(50) to VARCHAR(100) to support
  long MIME types like PPTX
- Remove reference to non-existent total_images metadata attribute

This significantly improves processing time for Office documents
(from ~170s OCR to ~10s Direct) while preserving text quality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-30 16:22:04 +08:00
..
2025-11-12 22:53:17 +08:00