OCR/openspec/changes/archive/2025-11-18-add-office-document-support/proposal.md

# Add Office Document Support

**Status**: ✅ IMPLEMENTED & TESTED

## Summary
Add support for Microsoft Office document formats (DOC, DOCX, PPT, PPTX) in the OCR processing pipeline and extend JWT token validity period to 1 day.

## Motivation
Currently, the system only supports image formats (PNG, JPG, JPEG) and PDF files. Many users have documents in Microsoft Office formats that require OCR processing. This change will:
1. Enable processing of Word and PowerPoint documents
2. Improve user experience by extending token validity
3. Leverage existing PDF-to-image conversion infrastructure

## Proposed Solution

### 1. Office Document Support
- Add Python libraries for Office document conversion:
  - `python-docx2pdf` or `python-docx` + `pypandoc` for Word documents
  - `python-pptx` for PowerPoint documents
- Implement conversion pipeline:
  - Option A: Office → PDF → Images → OCR
  - Option B: Office → Images → OCR (direct conversion)
- Extend file validation to accept `.doc`, `.docx`, `.ppt`, `.pptx` formats
- Add conversion methods to `OCRService` class

### 2. Token Validity Extension
- Update `ACCESS_TOKEN_EXPIRE_MINUTES` from 30 minutes to 1440 minutes (24 hours)
- Ensure security measures are in place for longer-lived tokens

## Impact Analysis
- **Backend Services**: Minimal changes to existing OCR processing flow
- **Dependencies**: New Python packages for Office document handling
- **Performance**: Slight increase in processing time for document conversion
- **Security**: Longer token validity requires careful consideration
- **Storage**: Temporary files during conversion process

## Success Criteria
1. Successfully process Word documents (.doc, .docx) with OCR
2. Successfully process PowerPoint documents (.ppt, .pptx) with OCR
3. JWT tokens remain valid for 24 hours
4. All existing functionality continues to work
5. Conversion quality maintains text readability for OCR

## Timeline
- Implementation: 2-3 hours ✅
- Testing: 1 hour ✅
- Documentation: 30 mins ✅
- Total: ~4 hours ✅ COMPLETED

## Actual Time
- Total development time: ~6 hours (including debugging and testing)
- Primary issues resolved: Configuration loading, MIME type mapping, validation logic, API endpoint fixes