chore: project cleanup and prepare for dual-track processing refactor
- Removed all test files and directories - Deleted outdated documentation (will be rewritten) - Cleaned up temporary files, logs, and uploads - Archived 5 completed OpenSpec proposals - Created new dual-track-document-processing proposal with complete OpenSpec structure - Dual-track architecture: OCR track (PaddleOCR) + Direct track (PyMuPDF) - UnifiedDocument model for consistent output - Support for structure-preserving translation - Updated .gitignore to prevent future test/temp files This is a major cleanup preparing for the complete refactoring of the document processing pipeline. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,186 @@
|
||||
# Office Document Support Integration
|
||||
|
||||
**Date**: 2025-11-12
|
||||
**Status**: ✅ INTEGRATED & TESTED
|
||||
**Sub-Proposal**: [add-office-document-support](../add-office-document-support/PROPOSAL.md)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document tracks the integration of Office document support (DOC, DOCX, PPT, PPTX) into the main OCR batch processing system. The integration was completed as a sub-proposal under the OpenSpec framework.
|
||||
|
||||
## Integration Summary
|
||||
|
||||
### Components Integrated
|
||||
|
||||
1. **Office Converter Service** ([backend/app/services/office_converter.py](../../../backend/app/services/office_converter.py))
|
||||
- LibreOffice headless mode for Office to PDF conversion
|
||||
- Support for DOC, DOCX, PPT, PPTX formats
|
||||
- Automatic cleanup of temporary conversion files
|
||||
|
||||
2. **Document Preprocessor Enhancement** ([backend/app/services/preprocessor.py](../../../backend/app/services/preprocessor.py))
|
||||
- Added Office MIME type mappings (application/msword, application/vnd.openxmlformats-officedocument.*)
|
||||
- ZIP-based integrity validation for modern Office formats
|
||||
- Office format detection and validation
|
||||
|
||||
3. **OCR Service Integration** ([backend/app/services/ocr_service.py](../../../backend/app/services/ocr_service.py))
|
||||
- Office document detection in `process_image()` method
|
||||
- Automatic conversion pipeline: Office → PDF → Images → OCR
|
||||
|
||||
4. **File Manager Updates** ([backend/app/services/file_manager.py](../../../backend/app/services/file_manager.py))
|
||||
- Extended allowed extensions to include Office formats
|
||||
|
||||
5. **Configuration Updates**
|
||||
- `.env`: Added Office formats to ALLOWED_EXTENSIONS
|
||||
- `app/core/config.py`: Extended default allowed extensions list
|
||||
|
||||
### Processing Pipeline
|
||||
|
||||
```
|
||||
Office Document (DOC/DOCX/PPT/PPTX)
|
||||
↓
|
||||
LibreOffice Headless Conversion
|
||||
↓
|
||||
PDF Document
|
||||
↓
|
||||
PDF to Images (existing)
|
||||
↓
|
||||
PaddleOCR Processing (existing)
|
||||
↓
|
||||
Markdown/JSON Output (existing)
|
||||
```
|
||||
|
||||
## Test Results
|
||||
|
||||
### Test Document
|
||||
- **File**: test_document.docx (1,521 bytes)
|
||||
- **Content**: Mixed Chinese/English text with structured formatting
|
||||
- **Batch ID**: 24
|
||||
|
||||
### Results
|
||||
- **Status**: ✅ Completed Successfully
|
||||
- **Processing Time**: 375.23 seconds (includes PaddleOCR model initialization)
|
||||
- **OCR Accuracy**: 97.39% confidence
|
||||
- **Text Regions**: 20 regions detected
|
||||
- **Language**: Chinese (mixed with English)
|
||||
|
||||
### Verification
|
||||
- ✅ DOCX upload and validation
|
||||
- ✅ DOCX → PDF conversion (LibreOffice headless mode)
|
||||
- ✅ PDF → Images conversion
|
||||
- ✅ OCR processing (PaddleOCR with PP-LCNet_x1_0_doc_ori structure analysis)
|
||||
- ✅ Markdown output generation with preserved structure
|
||||
|
||||
### Output Sample
|
||||
```markdown
|
||||
Office Document OCR Test
|
||||
|
||||
測試文件說明
|
||||
|
||||
這是一個用於測試 Tool_OCR 系統 Office 文件支援功能的測試文件。
|
||||
|
||||
本系統現已支援以下 Office格式:
|
||||
|
||||
• Microsoft Word: DOC, DOCX
|
||||
• Microsoft PowerPoint: PPT, PPTX
|
||||
|
||||
處理流程
|
||||
|
||||
Office 文件的處理流程如下:
|
||||
|
||||
1. 使用 LibreOffice 將 Office 文件轉換為 PDF
|
||||
```
|
||||
|
||||
## Bugs Fixed During Integration
|
||||
|
||||
1. **Database Column Error**: Fixed return value unpacking order in file_manager.py
|
||||
2. **Missing Office MIME Types**: Added Office MIME type mappings to preprocessor.py
|
||||
3. **Missing Integrity Validation**: Added Office format integrity validation
|
||||
4. **Configuration Loading Issue**: Updated `.env` file with Office formats
|
||||
5. **API Endpoint Mismatch**: Fixed test script to use correct API paths
|
||||
|
||||
## Dependencies Added
|
||||
|
||||
### System Dependencies (Homebrew)
|
||||
```bash
|
||||
brew install libreoffice
|
||||
```
|
||||
|
||||
### Configuration
|
||||
- LibreOffice path: `/Applications/LibreOffice.app/Contents/MacOS/soffice`
|
||||
- Conversion mode: Headless (`--headless --convert-to pdf`)
|
||||
|
||||
## API Changes
|
||||
|
||||
**No breaking changes**. Existing API endpoints remain unchanged:
|
||||
- `POST /api/v1/upload` - Now accepts Office formats
|
||||
- `POST /api/v1/ocr/process` - Automatically handles Office formats
|
||||
- `GET /api/v1/batch/{batch_id}/status` - Unchanged
|
||||
- `GET /api/v1/ocr/result/{file_id}` - Unchanged
|
||||
|
||||
## Task Updates
|
||||
|
||||
### Main Proposal: add-ocr-batch-processing
|
||||
|
||||
**Updated Tasks**:
|
||||
- Task 3: Document Preprocessing - **100% complete** (was 83%)
|
||||
- Task 3.4: Implement Office document to PDF conversion - **✅ COMPLETED**
|
||||
|
||||
**Updated Services**:
|
||||
- Document Preprocessor: Now includes Office format support
|
||||
- OCR Service: Now includes Office document conversion pipeline
|
||||
- Added: Office Converter service
|
||||
|
||||
**Updated Dependencies**:
|
||||
- Added LibreOffice to system dependencies
|
||||
|
||||
**Updated Phase 1 Progress**: **~87% complete** (was ~85%)
|
||||
|
||||
## Documentation
|
||||
|
||||
### Sub-Proposal Documentation
|
||||
- [PROPOSAL.md](../add-office-document-support/PROPOSAL.md) - Feature proposal
|
||||
- [tasks.md](../add-office-document-support/tasks.md) - Implementation tasks
|
||||
- [IMPLEMENTATION.md](../add-office-document-support/IMPLEMENTATION.md) - Implementation summary
|
||||
|
||||
### Test Resources
|
||||
- Test script: [demo_docs/office_tests/test_office_upload.py](../../../demo_docs/office_tests/test_office_upload.py)
|
||||
- Test document: [demo_docs/office_tests/test_document.docx](../../../demo_docs/office_tests/test_document.docx)
|
||||
- Document creation: [demo_docs/office_tests/create_docx.py](../../../demo_docs/office_tests/create_docx.py)
|
||||
|
||||
## Performance Impact
|
||||
|
||||
- **First-time processing**: ~375 seconds (includes PaddleOCR model download/initialization)
|
||||
- **Subsequent processing**: Expected to be faster (~10-30 seconds per document)
|
||||
- **Memory usage**: No significant increase observed
|
||||
- **Storage**: LibreOffice adds ~600MB to system requirements
|
||||
|
||||
## Migration Notes
|
||||
|
||||
**Backward Compatibility**: ✅ Fully backward compatible
|
||||
- Existing image and PDF processing unchanged
|
||||
- No database schema changes required
|
||||
- No API contract changes
|
||||
|
||||
**Upgrade Path**:
|
||||
1. Install LibreOffice via Homebrew: `brew install libreoffice`
|
||||
2. Update `.env` file with Office formats in ALLOWED_EXTENSIONS
|
||||
3. Restart backend service
|
||||
4. Verify with test script: `python demo_docs/office_tests/test_office_upload.py`
|
||||
|
||||
## Next Steps
|
||||
|
||||
Integration complete. The Office document support feature is now part of the main OCR batch processing system and ready for production use.
|
||||
|
||||
### Future Enhancements (Optional)
|
||||
- Add unit tests for office_converter.py
|
||||
- Add support for Excel files (XLS, XLSX)
|
||||
- Optimize LibreOffice conversion performance
|
||||
- Add preview generation for Office documents
|
||||
|
||||
---
|
||||
|
||||
**Integration Status**: ✅ COMPLETE
|
||||
**Test Status**: ✅ PASSED
|
||||
**Documentation Status**: ✅ COMPLETE
|
||||
@@ -0,0 +1,294 @@
|
||||
# Session Summary - 2025-11-12
|
||||
|
||||
## Completed Work
|
||||
|
||||
### ✅ Task 10: Backend - Background Tasks (83% Complete - 5/6 tasks)
|
||||
|
||||
This session successfully implemented comprehensive background task infrastructure for the Tool_OCR system.
|
||||
|
||||
---
|
||||
|
||||
## 📋 What Was Implemented
|
||||
|
||||
### 1. Background Tasks Service
|
||||
**File**: [backend/app/services/background_tasks.py](../../../backend/app/services/background_tasks.py)
|
||||
|
||||
Created `BackgroundTaskManager` class with:
|
||||
- **Generic retry execution framework** (`execute_with_retry`)
|
||||
- **File-level retry logic** (`process_single_file_with_retry`)
|
||||
- **Automatic cleanup scheduler** (`cleanup_expired_files`, `start_cleanup_scheduler`)
|
||||
- **PDF background generation** (`generate_pdf_background`)
|
||||
- **Batch processing with retry** (`process_batch_files_with_retry`)
|
||||
|
||||
**Configuration**:
|
||||
- Max retries: 3 attempts
|
||||
- Retry delay: 5 seconds
|
||||
- Cleanup interval: 1 hour
|
||||
- File retention: 24 hours
|
||||
|
||||
### 2. Database Migration
|
||||
**File**: [backend/alembic/versions/271dc036ea80_add_retry_count_to_files.py](../../../backend/alembic/versions/271dc036ea80_add_retry_count_to_files.py)
|
||||
|
||||
- Added `retry_count` field to `paddle_ocr_files` table
|
||||
- Tracks number of retry attempts per file
|
||||
- Default value: 0
|
||||
|
||||
### 3. Model Updates
|
||||
**File**: [backend/app/models/ocr.py](../../../backend/app/models/ocr.py#L76)
|
||||
|
||||
- Added `retry_count` column to `OCRFile` model
|
||||
- Integrated with retry logic in background tasks
|
||||
|
||||
### 4. Router Updates
|
||||
**File**: [backend/app/routers/ocr.py](../../../backend/app/routers/ocr.py#L240)
|
||||
|
||||
- Replaced `process_batch_files` with `process_batch_files_with_retry`
|
||||
- Now uses retry-enabled background processing
|
||||
- Removed old function, added reference comment
|
||||
|
||||
### 5. Application Lifecycle
|
||||
**File**: [backend/app/main.py](../../../backend/app/main.py#L42)
|
||||
|
||||
- Added cleanup scheduler to application startup
|
||||
- Starts automatically as background task
|
||||
- Graceful shutdown on application stop
|
||||
- Logs startup/shutdown events
|
||||
|
||||
### 6. Documentation Updates
|
||||
|
||||
**Updated Files**:
|
||||
- ✅ [openspec/changes/add-ocr-batch-processing/tasks.md](./tasks.md) - Marked Task 10 items as complete
|
||||
- ✅ [openspec/changes/add-ocr-batch-processing/STATUS.md](./STATUS.md) - Comprehensive status document
|
||||
- ✅ [SETUP.md](../../../SETUP.md) - Added Background Services section
|
||||
- ✅ [SESSION_SUMMARY.md](./SESSION_SUMMARY.md) - This file
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Task 10 Breakdown
|
||||
|
||||
| Task | Description | Status |
|
||||
|------|-------------|--------|
|
||||
| 10.1 | Implement FastAPI BackgroundTasks for async OCR processing | ✅ Complete |
|
||||
| 10.2 | Add task queue system (optional: Redis-based queue) | ⏸️ Optional (not needed) |
|
||||
| 10.3 | Implement progress updates (polling endpoint) | ✅ Complete |
|
||||
| 10.4 | Add error handling and retry logic | ✅ Complete |
|
||||
| 10.5 | Implement cleanup scheduler for expired files | ✅ Complete |
|
||||
| 10.6 | Add PDF generation to background tasks | ✅ Complete |
|
||||
|
||||
**Overall**: 5/6 tasks complete (83%) - Only optional Redis queue not implemented
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Features Delivered
|
||||
|
||||
### 1. Automatic Retry Logic
|
||||
- ✅ Up to 3 retry attempts per file
|
||||
- ✅ 5-second delay between retries
|
||||
- ✅ Detailed error messages with retry count
|
||||
- ✅ Database tracking of retry attempts
|
||||
- ✅ Configurable retry parameters
|
||||
|
||||
### 2. Cleanup Scheduler
|
||||
- ✅ Runs every 1 hour automatically
|
||||
- ✅ Deletes files older than 24 hours
|
||||
- ✅ Cleans up database records
|
||||
- ✅ Respects foreign key constraints
|
||||
- ✅ Logs cleanup activity
|
||||
- ✅ Configurable retention period
|
||||
|
||||
### 3. Background Task Infrastructure
|
||||
- ✅ Generic retry execution framework
|
||||
- ✅ PDF generation with retry logic
|
||||
- ✅ Proper error handling and logging
|
||||
- ✅ Graceful startup/shutdown
|
||||
- ✅ No blocking of main application
|
||||
|
||||
### 4. Monitoring & Observability
|
||||
- ✅ Detailed logging for all background tasks
|
||||
- ✅ Startup confirmation messages
|
||||
- ✅ Cleanup activity logs
|
||||
- ✅ Retry attempt tracking
|
||||
- ✅ Health check endpoint verification
|
||||
|
||||
---
|
||||
|
||||
## ✅ Verification
|
||||
|
||||
### Backend Status
|
||||
```bash
|
||||
$ curl http://localhost:12010/health
|
||||
{"status":"healthy","service":"Tool_OCR","version":"0.1.0"}
|
||||
```
|
||||
|
||||
### Cleanup Scheduler
|
||||
```bash
|
||||
$ grep "cleanup scheduler" /tmp/tool_ocr_startup.log
|
||||
2025-11-12 01:52:09,359 - app.main - INFO - Started cleanup scheduler for expired files
|
||||
2025-11-12 01:52:09,359 - app.services.background_tasks - INFO - Starting cleanup scheduler (interval: 3600s, retention: 24h)
|
||||
```
|
||||
|
||||
### Translation API (Reserved)
|
||||
```bash
|
||||
$ curl http://localhost:12010/api/v1/translate/status
|
||||
{"available":false,"status":"reserved","message":"Translation feature is reserved for future implementation",...}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📂 Files Created/Modified
|
||||
|
||||
### Created
|
||||
1. `backend/app/services/background_tasks.py` (430 lines) - Background task manager
|
||||
2. `backend/alembic/versions/271dc036ea80_add_retry_count_to_files.py` - Migration
|
||||
3. `openspec/changes/add-ocr-batch-processing/STATUS.md` - Comprehensive status
|
||||
4. `openspec/changes/add-ocr-batch-processing/SESSION_SUMMARY.md` - This file
|
||||
|
||||
### Modified
|
||||
1. `backend/app/models/ocr.py` - Added retry_count field
|
||||
2. `backend/app/routers/ocr.py` - Updated to use retry-enabled processing
|
||||
3. `backend/app/main.py` - Added cleanup scheduler startup
|
||||
4. `openspec/changes/add-ocr-batch-processing/tasks.md` - Updated Task 10 status
|
||||
5. `SETUP.md` - Added Background Services section
|
||||
|
||||
---
|
||||
|
||||
## 🎉 Current Project Status
|
||||
|
||||
### Phase 1: Backend Development (~85% Complete)
|
||||
- ✅ Task 1: Environment Setup (100%)
|
||||
- ✅ Task 2: Database Schema (100%)
|
||||
- ✅ Task 3: Document Preprocessing (83%)
|
||||
- ✅ Task 4: Core OCR Service (70%)
|
||||
- ✅ Task 5: PDF Generation (89%)
|
||||
- ✅ Task 6: File Management (86%)
|
||||
- ✅ Task 7: Export Service (90%)
|
||||
- ✅ Task 8: API Endpoints (93%)
|
||||
- ✅ Task 9: Translation Architecture RESERVED (83%)
|
||||
- ✅ **Task 10: Background Tasks (83%)** ⬅️ **Just Completed**
|
||||
|
||||
### Backend Services Status
|
||||
- ✅ **Backend API**: Running on http://localhost:12010
|
||||
- ✅ **Cleanup Scheduler**: Active (1-hour interval, 24-hour retention)
|
||||
- ✅ **Retry Logic**: Enabled (3 attempts, 5-second delay)
|
||||
- ✅ **Health Check**: Passing
|
||||
|
||||
---
|
||||
|
||||
## 📝 Next Steps (From OpenSpec)
|
||||
|
||||
### Immediate - Complete Phase 1
|
||||
According to OpenSpec [tasks.md](./tasks.md), the remaining Phase 1 tasks are:
|
||||
|
||||
1. **Unit Tests** (Multiple tasks)
|
||||
- Task 3.6: Preprocessor tests
|
||||
- Task 4.10: OCR service tests
|
||||
- Task 5.9: PDF generator tests
|
||||
- Task 6.7: File manager tests
|
||||
- Task 7.10: Export service tests
|
||||
- Task 8.14: API integration tests
|
||||
- Task 9.6: Translation service tests (optional)
|
||||
|
||||
2. **Complete Task 4.8-4.9** (OCR Service)
|
||||
- Implement batch processing with worker queue
|
||||
- Add progress tracking for batch jobs
|
||||
|
||||
### Future Phases
|
||||
- **Phase 2**: Frontend Development (Tasks 11-14)
|
||||
- **Phase 3**: Testing & Optimization (Tasks 15-16)
|
||||
- **Phase 4**: Deployment (Tasks 17-18)
|
||||
- **Phase 5**: Translation Implementation (Task 19)
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Technical Notes
|
||||
|
||||
### Why No Redis Queue?
|
||||
Task 10.2 was marked as optional because:
|
||||
- FastAPI BackgroundTasks is sufficient for current scale
|
||||
- No need for horizontal scaling yet
|
||||
- Simpler deployment without additional dependencies
|
||||
- Can be added later if needed
|
||||
|
||||
### Retry Logic Design
|
||||
The retry system was designed to be:
|
||||
- **Generic**: `execute_with_retry` works with any function
|
||||
- **Configurable**: Retry count and delay can be adjusted
|
||||
- **Transparent**: Logs all retry attempts
|
||||
- **Persistent**: Tracks retry count in database
|
||||
|
||||
### Cleanup Strategy
|
||||
The cleanup scheduler:
|
||||
- Runs on a fixed interval (not cron-based)
|
||||
- Only cleans completed/failed/partial batches
|
||||
- Deletes files before database records
|
||||
- Handles errors gracefully without stopping
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Configuration Options
|
||||
|
||||
To modify background task behavior, edit [backend/app/services/background_tasks.py](../../../backend/app/services/background_tasks.py):
|
||||
|
||||
```python
|
||||
# Create custom task manager instance
|
||||
custom_manager = BackgroundTaskManager(
|
||||
max_retries=5, # Increase retry attempts
|
||||
retry_delay=10, # Longer delay between retries
|
||||
cleanup_interval=7200, # Run cleanup every 2 hours
|
||||
file_retention_hours=48 # Keep files for 48 hours
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Code Statistics
|
||||
|
||||
### Lines of Code Added
|
||||
- background_tasks.py: **430 lines**
|
||||
- Migration file: **32 lines**
|
||||
- STATUS.md: **580 lines**
|
||||
- SESSION_SUMMARY.md: **280 lines**
|
||||
|
||||
**Total New Code**: ~1,300 lines
|
||||
|
||||
### Files Modified
|
||||
- 5 existing files updated
|
||||
- 4 new files created
|
||||
|
||||
---
|
||||
|
||||
## ✨ Key Achievements
|
||||
|
||||
1. ✅ **Robust Error Handling**: Automatic retry logic ensures transient failures don't lose work
|
||||
2. ✅ **Automatic Cleanup**: No manual intervention needed for old files
|
||||
3. ✅ **Scalable Architecture**: Background tasks allow async processing
|
||||
4. ✅ **Production Ready**: Graceful startup/shutdown, logging, monitoring
|
||||
5. ✅ **Well Documented**: Comprehensive docs for all new features
|
||||
6. ✅ **OpenSpec Compliant**: Followed specification exactly
|
||||
|
||||
---
|
||||
|
||||
## 🎓 Lessons Learned
|
||||
|
||||
1. **Async cleanup scheduler** requires `asyncio.create_task()` in lifespan context
|
||||
2. **Retry logic** should track attempts in database for debugging
|
||||
3. **Background tasks** need separate database sessions
|
||||
4. **Graceful shutdown** requires catching `asyncio.CancelledError`
|
||||
5. **Logging** is critical for monitoring background services
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- **OpenSpec**: [SPEC.md](./SPEC.md)
|
||||
- **Tasks**: [tasks.md](./tasks.md)
|
||||
- **Status**: [STATUS.md](./STATUS.md)
|
||||
- **Setup**: [SETUP.md](../../../SETUP.md)
|
||||
- **API Docs**: http://localhost:12010/docs
|
||||
|
||||
---
|
||||
|
||||
**Session Completed**: 2025-11-12
|
||||
**Time Invested**: ~1 hour
|
||||
**Tasks Completed**: Task 10 (5/6 subtasks)
|
||||
**Next Session**: Begin unit test implementation (Tasks 3.6, 4.10, 5.9, 6.7, 7.10, 8.14)
|
||||
@@ -0,0 +1,616 @@
|
||||
# Tool_OCR Development Status
|
||||
|
||||
**Last Updated**: 2025-11-12
|
||||
**Phase**: Phase 2 - Frontend Development (In Progress)
|
||||
**Current Task**: Frontend API Schema Alignment - Fixed 6 critical API mismatches
|
||||
|
||||
---
|
||||
|
||||
## 📊 Overall Progress
|
||||
|
||||
### Phase 1: Backend Development (Core OCR + Layout Preservation)
|
||||
- ✅ Task 1: Environment Setup (100%)
|
||||
- ✅ Task 2: Database Schema (100%)
|
||||
- ✅ Task 3: Document Preprocessing (100%) - Office format support integrated
|
||||
- ✅ Task 4: Core OCR Service (100%)
|
||||
- ✅ Task 5: PDF Generation (100%)
|
||||
- ✅ Task 6: File Management (100%)
|
||||
- ✅ Task 7: Export Service (100%)
|
||||
- ✅ Task 8: API Endpoints (100% - 14/14 tasks) ⬅️ **Updated: All endpoints aligned with frontend**
|
||||
- ✅ Task 9: Translation Architecture RESERVED (83% - 5/6 tasks)
|
||||
- ✅ Task 10: Background Tasks (83% - 5/6 tasks)
|
||||
|
||||
**Phase 1 Status**: ~98% complete
|
||||
|
||||
### Phase 2: Frontend Development (In Progress)
|
||||
- ✅ Task 11: Frontend Project Structure (100%)
|
||||
- ✅ Task 12: UI Components (70% - 7/10 tasks) ⬅️ **Updated**
|
||||
- ✅ Task 13: Pages (100% - 8/8 tasks) ⬅️ **Updated: All pages functional**
|
||||
- ✅ Task 14: API Integration (100% - 10/10 tasks) ⬅️ **Updated: API schemas aligned**
|
||||
|
||||
**Phase 2 Status**: ~92% complete ⬅️ **Updated: Core functionality working**
|
||||
|
||||
### Remaining Phases
|
||||
- ⏳ Phase 3: Testing & Documentation (Partially complete - manual testing done)
|
||||
- ⏳ Phase 4: Deployment (Not started)
|
||||
- ⏳ Phase 5: Translation Implementation (Reserved for future)
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Task 10 Implementation Details
|
||||
|
||||
### ✅ Completed (5/6)
|
||||
|
||||
**10.1 FastAPI BackgroundTasks for Async OCR Processing**
|
||||
- File: [backend/app/services/background_tasks.py](../../../backend/app/services/background_tasks.py)
|
||||
- Implemented `BackgroundTaskManager` class
|
||||
- OCR processing runs asynchronously via FastAPI BackgroundTasks
|
||||
- Router updated: [backend/app/routers/ocr.py:240](../../../backend/app/routers/ocr.py#L240)
|
||||
|
||||
**10.3 Progress Updates**
|
||||
- Batch progress tracking already implemented in Task 8
|
||||
- Properties: `batch.completed_files`, `batch.failed_files`, `batch.progress_percentage`
|
||||
- Endpoint: `GET /api/v1/batch/{batch_id}/status`
|
||||
|
||||
**10.4 Error Handling with Retry Logic**
|
||||
- File: [backend/app/services/background_tasks.py:63](../../../backend/app/services/background_tasks.py#L63)
|
||||
- Implemented `execute_with_retry()` method for generic retry logic
|
||||
- Implemented `process_single_file_with_retry()` for OCR processing with 3 retry attempts
|
||||
- Added `retry_count` field to `OCRFile` model
|
||||
- Migration: [backend/alembic/versions/271dc036ea80_add_retry_count_to_files.py](../../../backend/alembic/versions/271dc036ea80_add_retry_count_to_files.py)
|
||||
- Configurable retry delay (default: 5 seconds)
|
||||
- Error messages include retry attempt information
|
||||
|
||||
**10.5 Cleanup Scheduler for Expired Files**
|
||||
- File: [backend/app/services/background_tasks.py:189](../../../backend/app/services/background_tasks.py#L189)
|
||||
- Implemented `cleanup_expired_files()` method
|
||||
- Automatic cleanup of files older than 24 hours
|
||||
- Runs every 1 hour (configurable via `cleanup_interval`)
|
||||
- Deletes:
|
||||
- Physical files and directories
|
||||
- Database records (results, files, batches)
|
||||
- Respects foreign key constraints
|
||||
- Started automatically on application startup: [backend/app/main.py:42](../../../backend/app/main.py#L42)
|
||||
- Gracefully stopped on shutdown
|
||||
|
||||
**10.6 PDF Generation in Background Tasks**
|
||||
- File: [backend/app/services/background_tasks.py:226](../../../backend/app/services/background_tasks.py#L226)
|
||||
- Implemented `generate_pdf_background()` method
|
||||
- PDF generation runs with retry logic (2 retries, 3-second delay)
|
||||
- Ready to be integrated with export endpoints
|
||||
|
||||
### ⏸️ Optional (1/6)
|
||||
|
||||
**10.2 Redis-based Task Queue**
|
||||
- Status: Not implemented (marked as optional in OpenSpec)
|
||||
- Current approach: FastAPI BackgroundTasks (sufficient for current scale)
|
||||
- Future consideration: Can add Redis queue if needed for horizontal scaling
|
||||
|
||||
---
|
||||
|
||||
## 🗄️ Database Status
|
||||
|
||||
### Current Schema
|
||||
All tables use `paddle_ocr_` prefix for namespace isolation in shared database.
|
||||
|
||||
**Tables Created**:
|
||||
1. `paddle_ocr_users` - User authentication (JWT)
|
||||
2. `paddle_ocr_batches` - Batch processing metadata
|
||||
3. `paddle_ocr_files` - Individual file records (now includes `retry_count`)
|
||||
4. `paddle_ocr_results` - OCR results (Markdown, JSON, images)
|
||||
5. `paddle_ocr_export_rules` - User-defined export rules
|
||||
6. `paddle_ocr_translation_configs` - RESERVED for Phase 5
|
||||
|
||||
**Migrations Applied**:
|
||||
- ✅ a7802b126240: Initial migration with paddle_ocr prefix
|
||||
- ✅ 271dc036ea80: Add retry_count to files
|
||||
|
||||
### Test Data
|
||||
**Test Users**:
|
||||
- Username: `admin` / Password: `admin123` (Admin role)
|
||||
- Username: `testuser` / Password: `test123` (Regular user)
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Services Implemented
|
||||
|
||||
### Core Services
|
||||
|
||||
1. **Document Preprocessor** ([backend/app/services/preprocessor.py](../../../backend/app/services/preprocessor.py))
|
||||
- File format validation (PNG, JPG, JPEG, PDF, DOC, DOCX, PPT, PPTX)
|
||||
- Office document MIME type detection
|
||||
- ZIP-based integrity validation for modern Office formats
|
||||
- Corruption detection
|
||||
- Format standardization
|
||||
- Status: 100% complete (Office format support integrated via sub-proposal)
|
||||
|
||||
2. **OCR Service** ([backend/app/services/ocr_service.py](../../../backend/app/services/ocr_service.py))
|
||||
- PaddleOCR 3.x integration (PPStructureV3)
|
||||
- Layout detection and preservation
|
||||
- Multi-language support (ch, en, japan, korean)
|
||||
- Office document to PDF conversion pipeline (via LibreOffice)
|
||||
- Markdown and JSON output
|
||||
- Status: 100% complete ⬅️ **Updated: Unit tests complete (48 tests passing)**
|
||||
|
||||
3. **PDF Generator** ([backend/app/services/pdf_generator.py](../../../backend/app/services/pdf_generator.py))
|
||||
- Pandoc (preferred) + WeasyPrint (fallback)
|
||||
- Three CSS templates: default, academic, business
|
||||
- Chinese font support (Noto Sans CJK)
|
||||
- Layout preservation
|
||||
- Status: 100% complete ⬅️ **Updated: Unit tests complete (27 tests passing)**
|
||||
|
||||
4. **File Manager** ([backend/app/services/file_manager.py](../../../backend/app/services/file_manager.py))
|
||||
- Batch directory management
|
||||
- File access control
|
||||
- Temporary file cleanup (via cleanup scheduler)
|
||||
- Status: 100% complete ⬅️ **Updated: Unit tests complete (38 tests passing)**
|
||||
|
||||
5. **Export Service** ([backend/app/services/export_service.py](../../../backend/app/services/export_service.py))
|
||||
- Six formats: TXT, JSON, Excel, Markdown, PDF, ZIP
|
||||
- Rule-based filtering and formatting
|
||||
- CRUD for export rules
|
||||
- Status: 100% complete ⬅️ **Updated: Unit tests complete (37 tests passing)**
|
||||
|
||||
6. **Background Tasks** ([backend/app/services/background_tasks.py](../../../backend/app/services/background_tasks.py))
|
||||
- Retry logic for OCR processing
|
||||
- Automatic file cleanup scheduler
|
||||
- PDF generation with retry
|
||||
- Generic retry execution framework
|
||||
- Status: 83% complete
|
||||
|
||||
7. **Office Converter** ([backend/app/services/office_converter.py](../../../backend/app/services/office_converter.py)) ⬅️ **Integrated via sub-proposal**
|
||||
- LibreOffice headless mode for Office to PDF conversion
|
||||
- Support for DOC, DOCX, PPT, PPTX formats
|
||||
- Automatic cleanup of temporary conversion files
|
||||
- Integration with OCR processing pipeline
|
||||
- Status: 100% complete (tested with 97.39% OCR accuracy)
|
||||
|
||||
8. **Translation Service** (RESERVED) ([backend/app/services/translation_service.py](../../../backend/app/services/translation_service.py))
|
||||
- Stub implementation for Phase 5
|
||||
- Interface defined for future engines: Argos, ERNIE, Google, DeepL
|
||||
- Status: Reserved (not implemented)
|
||||
|
||||
---
|
||||
|
||||
## 🔌 API Endpoints
|
||||
|
||||
### Authentication
|
||||
- ✅ `POST /api/v1/auth/login` - JWT authentication
|
||||
|
||||
### File Upload
|
||||
- ✅ `POST /api/v1/upload` - Batch file upload with validation
|
||||
|
||||
### OCR Processing
|
||||
- ✅ `POST /api/v1/ocr/process` - Trigger OCR (uses background tasks with retry)
|
||||
- ✅ `GET /api/v1/batch/{batch_id}/status` - Get batch status with progress
|
||||
- ✅ `GET /api/v1/ocr/result/{file_id}` - Get OCR results
|
||||
|
||||
### Export
|
||||
- ✅ `POST /api/v1/export` - Export results (TXT, JSON, Excel, Markdown, PDF, ZIP)
|
||||
- ✅ `GET /api/v1/export/pdf/{file_id}` - Generate layout-preserved PDF
|
||||
- ✅ `GET /api/v1/export/rules` - List export rules
|
||||
- ✅ `POST /api/v1/export/rules` - Create export rule
|
||||
- ✅ `PUT /api/v1/export/rules/{rule_id}` - Update export rule
|
||||
- ✅ `DELETE /api/v1/export/rules/{rule_id}` - Delete export rule
|
||||
- ✅ `GET /api/v1/export/css-templates` - List CSS templates
|
||||
|
||||
### Translation (RESERVED)
|
||||
- ✅ `GET /api/v1/translate/status` - Feature status (returns "reserved")
|
||||
- ✅ `GET /api/v1/translate/languages` - Planned languages
|
||||
- ✅ `POST /api/v1/translate/document` - Returns 501 Not Implemented
|
||||
- ✅ `GET /api/v1/translate/task/{task_id}` - Returns 501 Not Implemented
|
||||
- ✅ `DELETE /api/v1/translate/task/{task_id}` - Returns 501 Not Implemented
|
||||
|
||||
**API Documentation**: http://localhost:12010/docs (FastAPI auto-generated)
|
||||
|
||||
---
|
||||
|
||||
## 🖥️ Environment Setup
|
||||
|
||||
### Conda Environment
|
||||
- Name: `tool_ocr`
|
||||
- Python: 3.10
|
||||
- Platform: macOS Apple Silicon (ARM64)
|
||||
|
||||
### Key Dependencies
|
||||
- **FastAPI**: Web framework
|
||||
- **PaddleOCR 3.x**: OCR engine with PPStructureV3
|
||||
- **SQLAlchemy**: ORM for MySQL
|
||||
- **Alembic**: Database migrations
|
||||
- **WeasyPrint + Pandoc**: PDF generation
|
||||
- **LibreOffice**: Office document to PDF conversion (headless mode)
|
||||
- **python-magic**: File type detection
|
||||
- **bcrypt 4.2.1**: Password hashing (pinned for compatibility)
|
||||
- **email-validator**: Email validation for Pydantic
|
||||
|
||||
### System Dependencies
|
||||
- **Homebrew packages**:
|
||||
- `libmagic` - File type detection
|
||||
- `pango`, `gdk-pixbuf`, `libffi` - WeasyPrint dependencies
|
||||
- `font-noto-sans-cjk` - Chinese font support
|
||||
- `pandoc` - Document conversion (optional)
|
||||
- `libreoffice` - Office document conversion (headless mode)
|
||||
|
||||
### Environment Variables
|
||||
```bash
|
||||
MYSQL_HOST=mysql.theaken.com
|
||||
MYSQL_PORT=33306
|
||||
MYSQL_DATABASE=db_A060
|
||||
BACKEND_PORT=12010
|
||||
SECRET_KEY=<generated-secret>
|
||||
DYLD_LIBRARY_PATH=/opt/homebrew/lib:$DYLD_LIBRARY_PATH
|
||||
```
|
||||
|
||||
### Critical Configuration
|
||||
- **Database Prefix**: All tables use `paddle_ocr_` prefix (shared database)
|
||||
- **File Retention**: 24 hours (automatic cleanup)
|
||||
- **Cleanup Interval**: 1 hour
|
||||
- **Retry Attempts**: 3 (configurable)
|
||||
- **Retry Delay**: 5 seconds (configurable)
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Service Status
|
||||
|
||||
### Backend Service
|
||||
- **Status**: ✅ Running
|
||||
- **URL**: http://localhost:12010
|
||||
- **Log File**: `/tmp/tool_ocr_startup.log`
|
||||
- **Process**: Running via Uvicorn with auto-reload
|
||||
|
||||
### Background Services
|
||||
- **Cleanup Scheduler**: ✅ Running (interval: 3600s, retention: 24h)
|
||||
- **OCR Processing**: ✅ Background tasks with retry logic
|
||||
|
||||
### Health Check
|
||||
```bash
|
||||
curl http://localhost:12010/health
|
||||
# Response: {"status":"healthy","service":"Tool_OCR","version":"0.1.0"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 Known Issues & Workarounds
|
||||
|
||||
### 1. Shared Database Environment
|
||||
- **Issue**: Database contains tables from other projects
|
||||
- **Solution**: All tables use `paddle_ocr_` prefix for namespace isolation
|
||||
- **Important**: NEVER drop tables in migrations (only create)
|
||||
|
||||
### 2. PaddleOCR 3.x Compatibility
|
||||
- **Issue**: Parameters `show_log` and `use_gpu` removed in PaddleOCR 3.x
|
||||
- **Solution**: Updated service to remove obsolete parameters
|
||||
- **Issue**: `PPStructure` renamed to `PPStructureV3`
|
||||
- **Solution**: Updated imports
|
||||
|
||||
### 3. Bcrypt Version
|
||||
- **Issue**: Latest bcrypt incompatible with passlib
|
||||
- **Solution**: Pinned to `bcrypt==4.2.1`
|
||||
|
||||
### 4. WeasyPrint on macOS
|
||||
- **Issue**: Missing shared libraries
|
||||
- **Solution**: Install via Homebrew and set `DYLD_LIBRARY_PATH`
|
||||
|
||||
### 5. First OCR Run
|
||||
- **Issue**: First OCR test may fail as PaddleOCR downloads models (~900MB)
|
||||
- **Solution**: Wait for download to complete, then retry
|
||||
- **Model Location**: `~/.paddlex/`
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Test Coverage
|
||||
|
||||
### Unit Tests Summary
|
||||
**Total Tests**: 187
|
||||
**Passed**: 182 ✅ (97.3% pass rate)
|
||||
**Skipped**: 5 (acceptable - technical limitations or covered elsewhere)
|
||||
**Failed**: 0 ✅
|
||||
|
||||
### Test Breakdown by Module
|
||||
|
||||
1. **test_preprocessor.py**: 32 tests ✅
|
||||
- Format validation (PNG, JPG, PDF, Office formats)
|
||||
- MIME type mapping
|
||||
- Integrity validation
|
||||
- File information extraction
|
||||
- Edge cases
|
||||
|
||||
2. **test_ocr_service.py**: 48 tests ✅
|
||||
- PaddleOCR 3.x integration
|
||||
- Layout detection and preservation
|
||||
- Markdown generation
|
||||
- JSON output
|
||||
- Real image processing (demo_docs/basic/english.png)
|
||||
- Structure engine initialization
|
||||
|
||||
3. **test_pdf_generator.py**: 27 tests ✅
|
||||
- Pandoc integration
|
||||
- WeasyPrint fallback
|
||||
- CSS template management
|
||||
- Unicode and table support
|
||||
- Error handling
|
||||
|
||||
4. **test_file_manager.py**: 38 tests ✅
|
||||
- File upload validation
|
||||
- Batch management
|
||||
- Access control
|
||||
- Cleanup operations
|
||||
|
||||
5. **test_export_service.py**: 37 tests ✅
|
||||
- Six export formats (TXT, JSON, Excel, Markdown, PDF, ZIP)
|
||||
- Rule-based filtering and formatting
|
||||
- Export rule CRUD operations
|
||||
|
||||
6. **test_api_integration.py**: 5 tests ✅
|
||||
- API endpoint integration
|
||||
- JWT authentication
|
||||
- Upload and OCR workflow
|
||||
|
||||
### Skipped Tests (Acceptable)
|
||||
1. `test_export_txt_success` - FileResponse validation (covered in unit tests)
|
||||
2. `test_generate_pdf_success` - FileResponse validation (covered in unit tests)
|
||||
3. `test_create_export_rule` - SQLite session isolation (works with MySQL)
|
||||
4. `test_update_export_rule` - SQLite session isolation (works with MySQL)
|
||||
5. `test_validate_upload_file_too_large` - Complex UploadFile mock (covered in integration)
|
||||
|
||||
### Test Coverage Achievements
|
||||
- ✅ All service layers tested with comprehensive unit tests
|
||||
- ✅ PaddleOCR 3.x format compatibility verified
|
||||
- ✅ Real image processing with demo samples
|
||||
- ✅ Edge cases and error handling covered
|
||||
- ✅ Integration tests for critical workflows
|
||||
|
||||
---
|
||||
|
||||
## 🌐 Phase 2: Frontend API Schema Alignment (2025-11-12)
|
||||
|
||||
### Issue Summary
|
||||
During frontend development, identified 6 critical API mismatches between frontend expectations and backend implementation that blocked upload, processing, and results preview functionality.
|
||||
|
||||
### 🐛 API Mismatches Fixed
|
||||
|
||||
**1. Upload Response Structure** ⬅️ **FIXED**
|
||||
- **Problem**: Backend returned `OCRBatchResponse` with `id` field, frontend expected `{ batch_id, files }`
|
||||
- **Solution**: Created `UploadBatchResponse` schema in [backend/app/schemas/ocr.py:91-115](../../../backend/app/schemas/ocr.py#L91-L115)
|
||||
- **Impact**: Upload now returns correct structure, fixes "no response after upload" issue
|
||||
- **Files Modified**:
|
||||
- `backend/app/schemas/ocr.py` - Added UploadBatchResponse schema
|
||||
- `backend/app/routers/ocr.py:38,72-75` - Updated response_model and return format
|
||||
|
||||
**2. Error Field Naming** ⬅️ **FIXED**
|
||||
- **Problem**: Frontend read `file.error`, backend had `error_message` field
|
||||
- **Solution**: Added Pydantic validation_alias in [backend/app/schemas/ocr.py:21](../../../backend/app/schemas/ocr.py#L21)
|
||||
- **Code**: `error: Optional[str] = Field(None, validation_alias='error_message')`
|
||||
- **Impact**: Error messages now display correctly in ProcessingPage
|
||||
|
||||
**3. Markdown Content Missing** ⬅️ **FIXED**
|
||||
- **Problem**: Frontend needed `markdown_content` for preview, only path was provided
|
||||
- **Solution**: Added field to OCRResultResponse in [backend/app/schemas/ocr.py:35](../../../backend/app/schemas/ocr.py#L35)
|
||||
- **Code**: `markdown_content: Optional[str] = None # Added for frontend preview`
|
||||
- **Impact**: Markdown preview now works in ResultsPage
|
||||
|
||||
**4. Export Options Schema Missing** ⬅️ **FIXED**
|
||||
- **Problem**: Frontend sent `options` object, backend didn't accept it
|
||||
- **Solution**: Created ExportOptions schema in [backend/app/schemas/export.py:10-15](../../../backend/app/schemas/export.py#L10-L15)
|
||||
- **Fields**: `confidence_threshold`, `include_metadata`, `filename_pattern`, `css_template`
|
||||
- **Impact**: Advanced export options now supported
|
||||
|
||||
**5. CSS Template Filename Field** ⬅️ **FIXED**
|
||||
- **Problem**: Frontend needed `filename`, backend only had `name` and `description`
|
||||
- **Solution**: Added filename field to CSSTemplateResponse in [backend/app/schemas/export.py:82](../../../backend/app/schemas/export.py#L82)
|
||||
- **Code**: `filename: str = Field(..., description="Template filename")`
|
||||
- **Impact**: CSS template selector now works correctly
|
||||
|
||||
**6. OCR Result Detail Structure** ⬅️ **FIXED** (Critical)
|
||||
- **Problem**: ResultsPage showed "檢視 Markdown - undefined" because:
|
||||
- Backend returned nested `{ file: {...}, result: {...} }` structure
|
||||
- Frontend expected flat structure with `filename`, `confidence`, `markdown_content` at root
|
||||
- **Solution**: Created OCRResultDetailResponse schema in [backend/app/schemas/ocr.py:77-89](../../../backend/app/schemas/ocr.py#L77-L89)
|
||||
- **Solution**: Updated endpoint in [backend/app/routers/ocr.py:181-240](../../../backend/app/routers/ocr.py#L181-L240) to:
|
||||
- Read markdown content from filesystem
|
||||
- Build flattened JSON data structure
|
||||
- Return all fields frontend expects at root level
|
||||
- **Impact**:
|
||||
- MarkdownPreview now shows correct filename in title
|
||||
- Confidence and processing time display correctly
|
||||
- Markdown content loads and displays properly
|
||||
|
||||
### ✅ Frontend Functionality Restored
|
||||
|
||||
**Upload Flow**:
|
||||
1. ✅ Files upload with progress indication
|
||||
2. ✅ Toast notification on success
|
||||
3. ✅ Automatic redirect to Processing page
|
||||
4. ✅ Batch ID and files stored in Zustand state
|
||||
|
||||
**Processing Flow**:
|
||||
1. ✅ Batch status polling works
|
||||
2. ✅ Progress percentage updates in real-time
|
||||
3. ✅ File status badges display correctly (pending/processing/completed/failed)
|
||||
4. ✅ Error messages show when files fail
|
||||
5. ✅ Automatic redirect to Results when complete
|
||||
|
||||
**Results Flow**:
|
||||
1. ✅ Batch summary displays (batch ID, completed count)
|
||||
2. ✅ Results table shows all files with actions
|
||||
3. ✅ Click file to view markdown preview
|
||||
4. ✅ Markdown title shows correct filename (not "undefined")
|
||||
5. ✅ Confidence and processing time display correctly
|
||||
6. ✅ PDF download works
|
||||
7. ✅ Export button navigates to export page
|
||||
|
||||
### 📝 Additional Frontend Fixes
|
||||
|
||||
**1. ResultsPage.tsx** ([frontend/src/pages/ResultsPage.tsx:134-143](../../../frontend/src/pages/ResultsPage.tsx#L134-L143))
|
||||
- Added null checks for undefined values:
|
||||
- `(ocrResult.confidence || 0)` - Prevents .toFixed() on undefined
|
||||
- `(ocrResult.processing_time || 0)` - Prevents .toFixed() on undefined
|
||||
- `ocrResult.json_data?.total_text_regions || 0` - Safe optional chaining
|
||||
|
||||
**2. ProcessingPage.tsx** (Already functional)
|
||||
- Batch ID validation working
|
||||
- Status polling implemented correctly
|
||||
- Error handling complete
|
||||
|
||||
### 🔧 API Endpoints Updated
|
||||
|
||||
**Upload Endpoint**:
|
||||
```typescript
|
||||
POST /api/v1/upload
|
||||
Response: { batch_id: number, files: OCRFileResponse[] }
|
||||
```
|
||||
|
||||
**Batch Status Endpoint**:
|
||||
```typescript
|
||||
GET /api/v1/batch/{batch_id}/status
|
||||
Response: { batch: OCRBatchResponse, files: OCRFileResponse[] }
|
||||
```
|
||||
|
||||
**OCR Result Endpoint** (New flattened structure):
|
||||
```typescript
|
||||
GET /api/v1/ocr/result/{file_id}
|
||||
Response: {
|
||||
file_id: number
|
||||
filename: string
|
||||
status: string
|
||||
markdown_content: string
|
||||
json_data: {...}
|
||||
confidence: number
|
||||
processing_time: number
|
||||
}
|
||||
```
|
||||
|
||||
### 🎯 Testing Verified
|
||||
- ✅ File upload with toast notification
|
||||
- ✅ Redirect to processing page
|
||||
- ✅ Processing status polling
|
||||
- ✅ Completed batch redirect to results
|
||||
- ✅ Results table display
|
||||
- ✅ Markdown preview with correct filename
|
||||
- ✅ Confidence and processing time display
|
||||
- ✅ PDF download functionality
|
||||
|
||||
### 📊 Phase 2 Progress Update
|
||||
- Task 12: UI Components - **70% complete** (MarkdownPreview working, missing Export/Rule editors)
|
||||
- Task 13: Pages - **100% complete** (All core pages functional)
|
||||
- Task 14: API Integration - **100% complete** (All API schemas aligned)
|
||||
|
||||
**Phase 2 Overall**: ~92% complete (Core user journey working end-to-end)
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Next Steps
|
||||
|
||||
### Immediate (Complete Phase 1)
|
||||
1. ~~**Write Unit Tests** (Tasks 3.6, 4.10, 5.9, 6.7, 7.10)~~ ✅ **COMPLETE**
|
||||
- ~~Preprocessor tests~~ ✅
|
||||
- ~~OCR service tests~~ ✅
|
||||
- ~~PDF generator tests~~ ✅
|
||||
- ~~File manager tests~~ ✅
|
||||
- ~~Export service tests~~ ✅
|
||||
|
||||
2. **API Integration Tests** (Task 8.14)
|
||||
- End-to-end workflow tests
|
||||
- Authentication tests
|
||||
- Error handling tests
|
||||
|
||||
3. **Final Phase 1 Documentation**
|
||||
- API usage examples
|
||||
- Deployment guide
|
||||
- Performance benchmarks
|
||||
|
||||
### Phase 2: Frontend Development (Not Started)
|
||||
- Task 11: Frontend project structure (Vite + React + TypeScript)
|
||||
- Task 12: UI components (shadcn/ui)
|
||||
- Task 13: Pages (Login, Upload, Processing, Results, Export)
|
||||
- Task 14: API integration
|
||||
|
||||
### Phase 3: Testing & Optimization
|
||||
- Comprehensive testing
|
||||
- Performance optimization
|
||||
- Documentation completion
|
||||
|
||||
### Phase 4: Deployment
|
||||
- Production environment setup
|
||||
- 1Panel deployment
|
||||
- SSL configuration
|
||||
- Monitoring setup
|
||||
|
||||
### Phase 5: Translation Feature (Future)
|
||||
- Choose translation engine (Argos/ERNIE/Google/DeepL)
|
||||
- Implement translation service
|
||||
- Update UI to enable translation features
|
||||
|
||||
---
|
||||
|
||||
## 📚 Documentation
|
||||
|
||||
### Setup Documentation
|
||||
- [SETUP.md](../../../SETUP.md) - Environment setup and installation
|
||||
- [README.md](../../../README.md) - Project overview
|
||||
|
||||
### OpenSpec Documentation
|
||||
- [SPEC.md](./SPEC.md) - Complete specification
|
||||
- [tasks.md](./tasks.md) - Task breakdown and progress
|
||||
- [STATUS.md](./STATUS.md) - This file
|
||||
- [OFFICE_INTEGRATION.md](./OFFICE_INTEGRATION.md) - Office document support integration summary
|
||||
|
||||
### Sub-Proposals
|
||||
- [add-office-document-support](../add-office-document-support/PROPOSAL.md) - Office format support (✅ INTEGRATED)
|
||||
|
||||
### API Documentation
|
||||
- **Interactive Docs**: http://localhost:12010/docs
|
||||
- **ReDoc**: http://localhost:12010/redoc
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Testing Commands
|
||||
|
||||
### Start Backend
|
||||
```bash
|
||||
source ~/.zshrc
|
||||
conda activate tool_ocr
|
||||
export DYLD_LIBRARY_PATH=/opt/homebrew/lib:$DYLD_LIBRARY_PATH
|
||||
python -m app.main
|
||||
```
|
||||
|
||||
### Test Service Layer
|
||||
```bash
|
||||
cd backend
|
||||
python test_services.py
|
||||
```
|
||||
|
||||
### Test API (Login)
|
||||
```bash
|
||||
curl -X POST http://localhost:12010/api/v1/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"username": "admin", "password": "admin123"}'
|
||||
```
|
||||
|
||||
### Check Cleanup Scheduler
|
||||
```bash
|
||||
tail -f /tmp/tool_ocr_startup.log | grep cleanup
|
||||
```
|
||||
|
||||
### Check Batch Progress
|
||||
```bash
|
||||
curl http://localhost:12010/api/v1/batch/{batch_id}/status
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📞 Support & Feedback
|
||||
|
||||
- **Project**: Tool_OCR - OCR Batch Processing System
|
||||
- **Development Approach**: OpenSpec-driven development
|
||||
- **Current Status**: Phase 2 Frontend ~92% complete ⬅️ **Updated: Core user journey working end-to-end**
|
||||
- **Backend Test Coverage**: 182/187 tests passing (97.3%)
|
||||
- **Next Milestone**: Complete remaining UI components (Export/Rule editors), Phase 3 testing
|
||||
|
||||
---
|
||||
|
||||
**Status Summary**:
|
||||
- **Phase 1 (Backend)**: ~98% complete - All core functionality working with comprehensive test coverage
|
||||
- **Phase 2 (Frontend)**: ~92% complete - Core user journey (Upload → Processing → Results) fully functional
|
||||
- **Recent Work**: Fixed 6 critical API schema mismatches between frontend and backend, enabling end-to-end workflow
|
||||
- **Verification**: Upload, OCR processing, and results preview all working correctly with proper error handling
|
||||
@@ -0,0 +1,313 @@
|
||||
# Technical Design Document
|
||||
|
||||
## Context
|
||||
Tool_OCR is a web-based batch OCR processing system with frontend-backend separation architecture. The system needs to handle large file uploads, long-running OCR tasks, and multiple export formats while maintaining responsive UI and efficient resource usage.
|
||||
|
||||
**Key stakeholders:**
|
||||
- End users: Need simple, fast, reliable OCR processing
|
||||
- Developers: Need maintainable, testable code architecture
|
||||
- Operations: Need easy deployment via 1Panel, monitoring, and error tracking
|
||||
|
||||
**Constraints:**
|
||||
- Development on Windows with Conda (Python 3.10)
|
||||
- Deployment on Linux server via 1Panel (no Docker)
|
||||
- Port range: 12010-12019
|
||||
- External MySQL database (mysql.theaken.com:33306)
|
||||
- PaddleOCR models (~100-200MB per language)
|
||||
- Max file upload: 20MB per file, 100MB per batch
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
### Goals
|
||||
- Process images and PDFs with multi-language OCR (Chinese, English, Japanese, Korean)
|
||||
- Handle batch uploads with real-time progress tracking
|
||||
- Provide flexible export formats (TXT, JSON, Excel) with custom rules
|
||||
- Maintain responsive UI during long-running OCR tasks
|
||||
- Enable easy deployment and maintenance via 1Panel
|
||||
|
||||
### Non-Goals
|
||||
- Real-time OCR streaming (batch processing only)
|
||||
- Cloud-based OCR services (local processing only)
|
||||
- Mobile app support (web UI only, desktop/tablet optimized)
|
||||
- Advanced image editing or annotation features
|
||||
- Multi-tenant SaaS architecture (single deployment per organization)
|
||||
|
||||
## Decisions
|
||||
|
||||
### Decision 1: FastAPI for Backend Framework
|
||||
**Choice:** Use FastAPI instead of Flask or Django
|
||||
|
||||
**Rationale:**
|
||||
- Native async/await support for I/O-bound operations (file upload, database queries)
|
||||
- Automatic OpenAPI documentation (Swagger UI)
|
||||
- Built-in Pydantic validation for type safety
|
||||
- Better performance for concurrent requests
|
||||
- Modern Python 3.10+ features (type hints, async)
|
||||
|
||||
**Alternatives considered:**
|
||||
- Flask: Simpler but lacks native async, requires extensions
|
||||
- Django: Too heavyweight for API-only backend, includes unnecessary ORM features
|
||||
|
||||
### Decision 2: PaddleOCR as OCR Engine
|
||||
**Choice:** Use PaddleOCR instead of Tesseract or cloud APIs
|
||||
|
||||
**Rationale:**
|
||||
- Excellent Chinese/multilingual support (key requirement)
|
||||
- Higher accuracy with deep learning models
|
||||
- Offline operation (no API costs or internet dependency)
|
||||
- Active development and good documentation
|
||||
- GPU acceleration support (optional)
|
||||
|
||||
**Alternatives considered:**
|
||||
- Tesseract: Lower accuracy for Chinese, older technology
|
||||
- Google Cloud Vision / AWS Textract: Requires internet, ongoing costs, data privacy concerns
|
||||
|
||||
### Decision 3: React Query for API State Management
|
||||
**Choice:** Use React Query (TanStack Query) instead of Redux
|
||||
|
||||
**Rationale:**
|
||||
- Designed specifically for server state (API calls, caching, refetching)
|
||||
- Built-in loading/error states
|
||||
- Automatic background refetching and cache invalidation
|
||||
- Reduces boilerplate compared to Redux
|
||||
- Better for our API-heavy use case
|
||||
|
||||
**Alternatives considered:**
|
||||
- Redux: Overkill for server state, more boilerplate
|
||||
- Plain Axios: Requires manual loading/error state management
|
||||
|
||||
### Decision 4: Zustand for Client State
|
||||
**Choice:** Use Zustand for global UI state (separate from React Query)
|
||||
|
||||
**Rationale:**
|
||||
- Lightweight (1KB) and simple API
|
||||
- No providers or context required
|
||||
- TypeScript-friendly
|
||||
- Works well alongside React Query
|
||||
- Only for UI state (selected files, filters, etc.)
|
||||
|
||||
### Decision 5: Background Task Processing
|
||||
**Choice:** FastAPI BackgroundTasks for OCR processing (no external queue initially)
|
||||
|
||||
**Rationale:**
|
||||
- Built-in FastAPI feature, no additional dependencies
|
||||
- Sufficient for single-server deployment
|
||||
- Simpler deployment and maintenance
|
||||
- Can migrate to Redis/Celery later if needed
|
||||
|
||||
**Migration path:** If scale requires, add Redis + Celery for distributed task queue
|
||||
|
||||
**Alternatives considered:**
|
||||
- Celery + Redis: More complex, overkill for initial deployment
|
||||
- Threading: FastAPI BackgroundTasks already uses thread pool
|
||||
|
||||
### Decision 6: File Storage Strategy
|
||||
**Choice:** Local filesystem with automatic cleanup (24-hour retention)
|
||||
|
||||
**Rationale:**
|
||||
- Simple implementation, no S3/cloud storage costs
|
||||
- OCR results stored in database (permanent)
|
||||
- Original files temporary, only needed during processing
|
||||
- Automatic cleanup prevents disk space issues
|
||||
|
||||
**Storage structure:**
|
||||
```
|
||||
uploads/
|
||||
{batch_id}/
|
||||
{file_id}_original.png
|
||||
{file_id}_preprocessed.png (if preprocessing enabled)
|
||||
```
|
||||
|
||||
**Cleanup:** Daily cron job or background task deletes files older than 24 hours
|
||||
|
||||
### Decision 7: Real-time Progress Updates
|
||||
**Choice:** HTTP polling instead of WebSocket
|
||||
|
||||
**Rationale:**
|
||||
- Simpler implementation and deployment
|
||||
- Works better with Nginx reverse proxy and 1Panel
|
||||
- Sufficient UX for batch processing (poll every 2 seconds)
|
||||
- No need for persistent connections
|
||||
|
||||
**API:** `GET /api/v1/batch/{batch_id}/status` returns progress percentage
|
||||
|
||||
**Alternatives considered:**
|
||||
- WebSocket: More complex, requires special Nginx config, overkill for this use case
|
||||
|
||||
### Decision 8: Database Schema Design
|
||||
**Choice:** Separate tables for tasks, files, and results (normalized)
|
||||
|
||||
**Schema:**
|
||||
```sql
|
||||
users (id, username, password_hash, created_at)
|
||||
ocr_batches (id, user_id, status, created_at, completed_at)
|
||||
ocr_files (id, batch_id, filename, file_path, file_size, status)
|
||||
ocr_results (id, file_id, text, bbox_json, confidence, language)
|
||||
export_rules (id, user_id, rule_name, config_json)
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- Normalized for data integrity
|
||||
- Supports batch tracking and partial failures
|
||||
- Easy to query individual file results or batch statistics
|
||||
- Export rules reusable across users
|
||||
|
||||
### Decision 9: Export Rule Configuration Format
|
||||
**Choice:** JSON-based rule configuration stored in database
|
||||
|
||||
**Example rule:**
|
||||
```json
|
||||
{
|
||||
"filters": {
|
||||
"min_confidence": 0.8,
|
||||
"filename_pattern": "^invoice_.*"
|
||||
},
|
||||
"formatting": {
|
||||
"add_line_numbers": true,
|
||||
"sort_by_position": true,
|
||||
"group_by_page": true
|
||||
},
|
||||
"output": {
|
||||
"format": "txt",
|
||||
"encoding": "utf-8",
|
||||
"line_separator": "\n"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- Flexible and extensible
|
||||
- Easy to validate with JSON schema
|
||||
- Can be edited via UI or API
|
||||
- Supports complex rules without database schema changes
|
||||
|
||||
### Decision 10: Deployment Architecture (1Panel)
|
||||
**Choice:** Nginx (static files + reverse proxy) + Supervisor (backend process manager)
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
[Client Browser]
|
||||
↓
|
||||
[Nginx :80/443] (managed by 1Panel)
|
||||
↓
|
||||
├─ / → Frontend static files (React build)
|
||||
├─ /assets → Static assets
|
||||
└─ /api → Reverse proxy to backend :12010
|
||||
↓
|
||||
[FastAPI Backend :12010] (managed by Supervisor)
|
||||
↓
|
||||
[MySQL :33306] (external)
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- 1Panel provides GUI for Nginx management
|
||||
- Supervisor ensures backend auto-restart on failure
|
||||
- No Docker simplifies deployment on existing infrastructure
|
||||
- Standard Nginx config works without special 1Panel requirements
|
||||
|
||||
**Supervisor config:**
|
||||
```ini
|
||||
[program:tool_ocr_backend]
|
||||
command=/home/user/.conda/envs/tool_ocr/bin/uvicorn app.main:app --host 127.0.0.1 --port 12010
|
||||
directory=/path/to/Tool_OCR/backend
|
||||
user=www-data
|
||||
autostart=true
|
||||
autorestart=true
|
||||
```
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
### Risk 1: OCR Processing Time for Large Batches
|
||||
**Risk:** Processing 50+ images may take 5-10 minutes, potential timeout
|
||||
|
||||
**Mitigation:**
|
||||
- Use FastAPI BackgroundTasks to avoid HTTP timeout
|
||||
- Return batch_id immediately, client polls for status
|
||||
- Display progress bar with estimated time remaining
|
||||
- Limit max batch size to 50 files (configurable)
|
||||
- Add worker concurrency limit to prevent resource exhaustion
|
||||
|
||||
### Risk 2: PaddleOCR Model Download on First Run
|
||||
**Risk:** Models are 100-200MB, first-time download may fail or be slow
|
||||
|
||||
**Mitigation:**
|
||||
- Pre-download models during deployment setup
|
||||
- Provide manual download script for offline installation
|
||||
- Cache models in shared directory for all users
|
||||
- Include model version in deployment docs
|
||||
|
||||
### Risk 3: File Upload Size Limits
|
||||
**Risk:** Users may try to upload very large PDFs (>20MB)
|
||||
|
||||
**Mitigation:**
|
||||
- Enforce 20MB per file, 100MB per batch limits in frontend and backend
|
||||
- Display clear error messages with limit information
|
||||
- Provide guidance on compressing PDFs or splitting large files
|
||||
- Consider adding image downsampling for huge images
|
||||
|
||||
### Risk 4: Concurrent User Scaling
|
||||
**Risk:** Multiple users uploading simultaneously may overwhelm CPU/memory
|
||||
|
||||
**Mitigation:**
|
||||
- Limit concurrent OCR workers (e.g., 4 workers max)
|
||||
- Implement task queue with FastAPI BackgroundTasks
|
||||
- Monitor resource usage and add throttling if needed
|
||||
- Document recommended server specs (8GB RAM, 4 CPU cores)
|
||||
|
||||
### Risk 5: Database Connection Pool Exhaustion
|
||||
**Risk:** External MySQL may have connection limits
|
||||
|
||||
**Mitigation:**
|
||||
- Configure SQLAlchemy connection pool (max 20 connections)
|
||||
- Use connection pooling with proper timeout settings
|
||||
- Close connections properly in all API endpoints
|
||||
- Add health check endpoint to monitor database connectivity
|
||||
|
||||
## Migration Plan
|
||||
|
||||
### Phase 1: Initial Deployment
|
||||
1. Setup Conda environment on production server
|
||||
2. Install Python dependencies and download OCR models
|
||||
3. Configure MySQL database and create tables
|
||||
4. Build frontend static files (`npm run build`)
|
||||
5. Configure Nginx via 1Panel (upload nginx.conf)
|
||||
6. Setup Supervisor for backend process
|
||||
7. Test with sample images
|
||||
|
||||
### Phase 2: Production Rollout
|
||||
1. Create admin user account
|
||||
2. Import sample export rules
|
||||
3. Perform smoke tests (upload, OCR, export)
|
||||
4. Monitor logs for errors
|
||||
5. Setup daily cleanup cron job for old files
|
||||
6. Enable HTTPS via 1Panel (Let's Encrypt)
|
||||
|
||||
### Phase 3: Monitoring and Optimization
|
||||
1. Add application logging (file + console)
|
||||
2. Monitor resource usage (CPU, memory, disk)
|
||||
3. Optimize slow queries if needed
|
||||
4. Tune worker concurrency based on actual load
|
||||
5. Collect user feedback and iterate
|
||||
|
||||
### Rollback Plan
|
||||
- Keep previous version in separate directory
|
||||
- Use Supervisor to stop current version and start previous
|
||||
- Database migrations should be backward compatible
|
||||
- If major issues, restore database from backup
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Should we add user registration, or use admin-created accounts only?**
|
||||
- Recommendation: Start with admin-created accounts for security, add registration later if needed
|
||||
|
||||
2. **Do we need audit logging for compliance?**
|
||||
- Recommendation: Add basic audit trail (who uploaded what, when) in database
|
||||
|
||||
3. **Should we support GPU acceleration for PaddleOCR?**
|
||||
- Recommendation: Optional, detect GPU on startup, fallback to CPU if unavailable
|
||||
|
||||
4. **What's the desired behavior for duplicate filenames in a batch?**
|
||||
- Recommendation: Auto-rename with suffix (e.g., `file.png`, `file_1.png`)
|
||||
|
||||
5. **Should export rules be shareable across users or private?**
|
||||
- Recommendation: Private by default, add "public templates" feature later
|
||||
@@ -0,0 +1,48 @@
|
||||
# Change: Add OCR Batch Processing System with Structure Extraction
|
||||
|
||||
## Why
|
||||
Users need a web-based solution to extract text, images, and structure from multiple document files efficiently. Current manual text extraction is time-consuming and error-prone. This system will automate the process with multi-language OCR support (Chinese, English, etc.), intelligent layout analysis to understand document structure, and provide flexible export options including searchable PDF with embedded images. The extracted content preserves logical structure and reading order (not pixel-perfect visual layout). The system also reserves architecture for future document translation capabilities.
|
||||
|
||||
## What Changes
|
||||
- Add core OCR processing capability using **PaddleOCR-VL** (vision-language model for document parsing)
|
||||
- Implement **document structure analysis** with PP-StructureV3 to identify titles, paragraphs, tables, images, formulas
|
||||
- Extract and **preserve document images** alongside text content
|
||||
- Support unified input preprocessing (convert any format to images/PDF for OCR processing)
|
||||
- Implement batch file upload and processing (images: PNG, JPG, PDF files)
|
||||
- Support multi-language text recognition (Chinese traditional/simplified, English, Japanese, Korean) - 109 languages via PaddleOCR-VL
|
||||
- Add **Markdown intermediate format** for structured document representation with embedded images
|
||||
- Implement **searchable PDF generation** from Markdown with images (Pandoc + WeasyPrint)
|
||||
- Generate PDFs that preserve logical structure and reading order (not exact visual layout)
|
||||
- Add rule-based output formatting system for organizing extracted text
|
||||
- Implement multiple export formats (TXT, JSON, Excel, **Markdown with images, searchable PDF**)
|
||||
- Create web UI with drag-and-drop file upload
|
||||
- Build RESTful API for OCR processing with progress tracking
|
||||
- Add background task processing for long-running OCR jobs
|
||||
- **Reserve translation module architecture** (UI placeholders + API endpoints for future implementation)
|
||||
|
||||
## Impact
|
||||
- **New capabilities**:
|
||||
- `ocr-processing`: Core OCR text and image extraction with structure analysis (PaddleOCR-VL + PP-StructureV3)
|
||||
- `file-management`: File upload, validation, and storage with format standardization
|
||||
- `export-results`: Multi-format export with custom rules, including searchable PDF with embedded images
|
||||
- `translation` (reserved): Architecture for future translation features
|
||||
|
||||
- **Affected code**:
|
||||
- New backend: `app/` (FastAPI application structure)
|
||||
- New frontend: `frontend/` (React + Vite application)
|
||||
- New database tables: `ocr_tasks`, `ocr_results`, `export_rules`, `translation_configs` (reserved)
|
||||
|
||||
- **Dependencies**:
|
||||
- Backend: fastapi, paddleocr (3.0+), paddlepaddle, pdf2image, pandas, pillow, weasyprint, markdown, pandoc (system)
|
||||
- Frontend: react, vite, tailwindcss, shadcn/ui, axios, react-query
|
||||
- Translation engines (reserved): argostranslate (offline) or API integration
|
||||
|
||||
- **Configuration**:
|
||||
- MySQL database connection (external server)
|
||||
- PaddleOCR-VL model storage (~900MB) and language packs
|
||||
- Pandoc installation for PDF generation
|
||||
- Basic CSS template for readable PDF output (not for visual layout replication)
|
||||
- Image storage directory for extracted images
|
||||
- File upload size limits and supported formats
|
||||
- Port configuration (12010 for backend, 12011 for frontend dev)
|
||||
- Translation service config (reserved for future)
|
||||
@@ -0,0 +1,175 @@
|
||||
# Export Results Specification
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Plain Text Export
|
||||
The system SHALL export OCR results as plain text files with configurable formatting.
|
||||
|
||||
#### Scenario: Export single file result as TXT
|
||||
- **WHEN** user selects a completed OCR task and chooses TXT export
|
||||
- **THEN** the system generates a .txt file with extracted text
|
||||
- **AND** preserves line breaks based on bounding box positions
|
||||
- **AND** returns downloadable file
|
||||
|
||||
#### Scenario: Export batch results as TXT
|
||||
- **WHEN** user exports a batch with 5 files as TXT
|
||||
- **THEN** the system creates a ZIP file containing 5 .txt files
|
||||
- **AND** names each file as `{original_filename}_ocr.txt`
|
||||
- **AND** returns the ZIP for download
|
||||
|
||||
### Requirement: JSON Export
|
||||
The system SHALL export OCR results as structured JSON with full metadata.
|
||||
|
||||
#### Scenario: Export with metadata
|
||||
- **WHEN** user selects JSON export format
|
||||
- **THEN** the system generates JSON containing:
|
||||
- File information (name, size, format)
|
||||
- OCR results array with text, bounding boxes, confidence
|
||||
- Processing metadata (timestamp, language, model version)
|
||||
- Task status and statistics
|
||||
|
||||
#### Scenario: JSON export example structure
|
||||
- **WHEN** export is generated
|
||||
- **THEN** JSON structure follows this format:
|
||||
```json
|
||||
{
|
||||
"file_name": "document.png",
|
||||
"file_size": 1024000,
|
||||
"upload_time": "2025-01-01T10:00:00Z",
|
||||
"processing_time": 2.5,
|
||||
"language": "zh-TW",
|
||||
"results": [
|
||||
{
|
||||
"text": "範例文字",
|
||||
"bbox": [100, 50, 200, 80],
|
||||
"confidence": 0.95
|
||||
}
|
||||
],
|
||||
"status": "completed"
|
||||
}
|
||||
```
|
||||
|
||||
### Requirement: Excel Export
|
||||
The system SHALL export OCR results as Excel spreadsheets with tabular format.
|
||||
|
||||
#### Scenario: Single file Excel export
|
||||
- **WHEN** user selects Excel export for one file
|
||||
- **THEN** the system generates .xlsx file with columns:
|
||||
- Row Number
|
||||
- Recognized Text
|
||||
- Confidence Score
|
||||
- Bounding Box (X, Y, Width, Height)
|
||||
- Language
|
||||
|
||||
#### Scenario: Batch Excel export with multiple sheets
|
||||
- **WHEN** user exports batch with 3 files as Excel
|
||||
- **THEN** the system creates one .xlsx file with 3 sheets
|
||||
- **AND** names each sheet as the original filename
|
||||
- **AND** includes summary sheet with statistics
|
||||
|
||||
### Requirement: Rule-Based Output Formatting
|
||||
The system SHALL apply user-defined rules to format exported text.
|
||||
|
||||
#### Scenario: Group by filename pattern
|
||||
- **WHEN** user defines rule "group files with prefix 'invoice_'"
|
||||
- **THEN** the system groups all matching files together
|
||||
- **AND** exports them in a single combined file or folder
|
||||
|
||||
#### Scenario: Filter by confidence threshold
|
||||
- **WHEN** user sets export rule "minimum confidence 0.8"
|
||||
- **THEN** the system excludes text with confidence < 0.8 from export
|
||||
- **AND** includes only high-confidence results
|
||||
|
||||
#### Scenario: Custom text formatting
|
||||
- **WHEN** user defines rule "add line numbers"
|
||||
- **THEN** the system prepends line numbers to each text line
|
||||
- **AND** formats output as: `1. 第一行文字\n2. 第二行文字`
|
||||
|
||||
#### Scenario: Sort by reading order
|
||||
- **WHEN** user enables "sort by position" rule
|
||||
- **THEN** the system orders text by vertical position (top to bottom)
|
||||
- **AND** then by horizontal position (left to right) within each row
|
||||
- **AND** exports text in natural reading order
|
||||
|
||||
### Requirement: Export Rule Configuration
|
||||
The system SHALL allow users to save and reuse export rules.
|
||||
|
||||
#### Scenario: Save custom export rule
|
||||
- **WHEN** user creates a rule with name "高品質發票輸出"
|
||||
- **THEN** the system saves the rule to database
|
||||
- **AND** associates it with the user account
|
||||
- **AND** makes it available in rule selection dropdown
|
||||
|
||||
#### Scenario: Apply saved rule
|
||||
- **WHEN** user selects a saved rule for export
|
||||
- **THEN** the system applies all configured filters and formatting
|
||||
- **AND** generates output according to rule settings
|
||||
|
||||
#### Scenario: Edit existing rule
|
||||
- **WHEN** user modifies a saved rule
|
||||
- **THEN** the system updates the rule configuration
|
||||
- **AND** preserves the rule ID for continuity
|
||||
|
||||
### Requirement: Markdown Export with Structure and Images
|
||||
The system SHALL export OCR results as Markdown files preserving document logical structure with accompanying images.
|
||||
|
||||
#### Scenario: Export as Markdown with structure and images
|
||||
- **WHEN** user selects Markdown export format
|
||||
- **THEN** the system generates .md file with logical structure
|
||||
- **AND** includes headings, paragraphs, tables, lists in proper hierarchy
|
||||
- **AND** embeds image references pointing to extracted images ()
|
||||
- **AND** maintains reading order from OCR analysis
|
||||
- **AND** includes extracted images in an images/ folder
|
||||
|
||||
#### Scenario: Batch Markdown export with images
|
||||
- **WHEN** user exports batch with 5 files as Markdown
|
||||
- **THEN** the system creates 5 separate .md files
|
||||
- **AND** creates corresponding images/ folders for each document
|
||||
- **AND** optionally creates combined .md with page separators
|
||||
- **AND** returns ZIP file containing all Markdown files and images
|
||||
|
||||
### Requirement: Searchable PDF Export with Images
|
||||
The system SHALL generate searchable PDF files that include extracted text and images, preserving logical document structure (not exact visual layout).
|
||||
|
||||
#### Scenario: Single document PDF export with images
|
||||
- **WHEN** user requests PDF export from OCR result
|
||||
- **THEN** the system converts Markdown to HTML with basic CSS styling
|
||||
- **AND** embeds extracted images from images/ folder
|
||||
- **AND** generates PDF using Pandoc + WeasyPrint
|
||||
- **AND** preserves document hierarchy, tables, and reading order
|
||||
- **AND** images appear near their logical position in text flow
|
||||
- **AND** uses appropriate Chinese font (Noto Sans CJK)
|
||||
- **AND** produces searchable PDF with selectable text
|
||||
|
||||
#### Scenario: Basic PDF formatting options
|
||||
- **WHEN** user selects PDF export
|
||||
- **THEN** the system applies basic readable formatting
|
||||
- **AND** sets standard margins and page size (A4)
|
||||
- **AND** uses consistent fonts and spacing
|
||||
- **AND** ensures images fit within page width
|
||||
- **NOTE** CSS templates are for basic readability, not for replicating original visual design
|
||||
|
||||
#### Scenario: Batch PDF export with images
|
||||
- **WHEN** user exports batch as PDF
|
||||
- **THEN** the system generates individual PDF for each document with embedded images
|
||||
- **OR** creates single merged PDF with page breaks
|
||||
- **AND** maintains consistent formatting across all pages
|
||||
- **AND** returns ZIP of PDFs or single merged PDF
|
||||
|
||||
### Requirement: Export Format Selection
|
||||
The system SHALL provide UI for selecting export format and options.
|
||||
|
||||
#### Scenario: Format selection with preview
|
||||
- **WHEN** user opens export dialog
|
||||
- **THEN** the system displays format options (TXT, JSON, Excel, **Markdown with images, Searchable PDF**)
|
||||
- **AND** shows preview of output structure for selected format
|
||||
- **AND** allows applying custom rules for text filtering
|
||||
- **AND** provides basic formatting option for PDF (standard readable format)
|
||||
|
||||
#### Scenario: Batch export with format choice
|
||||
- **WHEN** user selects multiple completed tasks
|
||||
- **THEN** the system enables batch export button
|
||||
- **AND** prompts for format selection
|
||||
- **AND** generates combined export file
|
||||
- **AND** shows progress bar for PDF generation (slower due to image processing)
|
||||
- **AND** includes all extracted images when exporting Markdown or PDF
|
||||
@@ -0,0 +1,96 @@
|
||||
# File Management Specification
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: File Upload Validation
|
||||
The system SHALL validate uploaded files for type, size, and content before processing.
|
||||
|
||||
#### Scenario: Valid image upload
|
||||
- **WHEN** user uploads a PNG file of 5MB
|
||||
- **THEN** the system accepts the file
|
||||
- **AND** stores it in temporary upload directory
|
||||
- **AND** returns upload success with file ID
|
||||
|
||||
#### Scenario: Oversized file rejection
|
||||
- **WHEN** user uploads a file larger than 20MB
|
||||
- **THEN** the system rejects the file
|
||||
- **AND** returns error message "文件大小超過限制 (最大 20MB)"
|
||||
- **AND** does not store the file
|
||||
|
||||
#### Scenario: Invalid file type rejection
|
||||
- **WHEN** user uploads a .exe or .zip file
|
||||
- **THEN** the system rejects the file
|
||||
- **AND** returns error message "不支援的文件類型,僅支援 PNG, JPG, JPEG, PDF"
|
||||
|
||||
#### Scenario: Corrupted image detection
|
||||
- **WHEN** user uploads a corrupted image file
|
||||
- **THEN** the system attempts to open the file
|
||||
- **AND** detects corruption during validation
|
||||
- **AND** returns error message "文件損壞,無法處理"
|
||||
|
||||
### Requirement: Supported File Formats
|
||||
The system SHALL support PNG, JPG, JPEG, and PDF file formats for OCR processing.
|
||||
|
||||
#### Scenario: PNG image processing
|
||||
- **WHEN** user uploads a .png file
|
||||
- **THEN** the system processes it directly with PaddleOCR
|
||||
|
||||
#### Scenario: JPG/JPEG image processing
|
||||
- **WHEN** user uploads a .jpg or .jpeg file
|
||||
- **THEN** the system processes it directly with PaddleOCR
|
||||
|
||||
#### Scenario: PDF file processing
|
||||
- **WHEN** user uploads a .pdf file
|
||||
- **THEN** the system converts PDF pages to images using pdf2image
|
||||
- **AND** processes each page image with PaddleOCR
|
||||
|
||||
### Requirement: Batch Upload Management
|
||||
The system SHALL manage multiple file uploads with batch organization.
|
||||
|
||||
#### Scenario: Create batch from multiple files
|
||||
- **WHEN** user uploads 5 files in a single request
|
||||
- **THEN** the system creates a batch with unique batch_id
|
||||
- **AND** associates all files with the batch_id
|
||||
- **AND** returns batch_id and file list
|
||||
|
||||
#### Scenario: Query batch status
|
||||
- **WHEN** user requests batch status by batch_id
|
||||
- **THEN** the system returns:
|
||||
- Total files in batch
|
||||
- Completed count
|
||||
- Failed count
|
||||
- Processing count
|
||||
- Overall batch status (pending/processing/completed/failed)
|
||||
|
||||
### Requirement: File Storage Management
|
||||
The system SHALL store uploaded files temporarily and clean up after processing.
|
||||
|
||||
#### Scenario: Temporary file storage
|
||||
- **WHEN** user uploads files
|
||||
- **THEN** the system stores files in `uploads/{batch_id}/` directory
|
||||
- **AND** generates unique filenames to prevent conflicts
|
||||
|
||||
#### Scenario: Automatic cleanup after processing
|
||||
- **WHEN** OCR processing completes for a batch
|
||||
- **THEN** the system keeps files for 24 hours
|
||||
- **AND** automatically deletes files after retention period
|
||||
- **AND** preserves OCR results in database
|
||||
|
||||
#### Scenario: Manual file deletion
|
||||
- **WHEN** user requests to delete a batch
|
||||
- **THEN** the system removes all associated files from storage
|
||||
- **AND** marks the batch as deleted in database
|
||||
- **AND** returns deletion confirmation
|
||||
|
||||
### Requirement: File Access Control
|
||||
The system SHALL ensure users can only access their own uploaded files.
|
||||
|
||||
#### Scenario: User accesses own files
|
||||
- **WHEN** authenticated user requests file by file_id
|
||||
- **THEN** the system verifies ownership
|
||||
- **AND** returns file if user is the owner
|
||||
|
||||
#### Scenario: User attempts to access others' files
|
||||
- **WHEN** user requests file_id belonging to another user
|
||||
- **THEN** the system denies access
|
||||
- **AND** returns 403 Forbidden error
|
||||
@@ -0,0 +1,125 @@
|
||||
# OCR Processing Specification
|
||||
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Multi-Language Text Recognition with Structure Analysis
|
||||
The system SHALL extract text and images from document files using PaddleOCR-VL with support for 109 languages including Chinese (traditional and simplified), English, Japanese, and Korean, while preserving document logical structure and reading order (not pixel-perfect visual layout).
|
||||
|
||||
#### Scenario: Single image OCR with Chinese text
|
||||
- **WHEN** user uploads a PNG image containing Chinese text
|
||||
- **THEN** the system extracts text with bounding boxes and confidence scores
|
||||
- **AND** returns structured JSON with recognized text, coordinates, and language detected
|
||||
- **AND** generates Markdown output preserving text layout and hierarchy
|
||||
|
||||
#### Scenario: PDF document OCR with layout preservation
|
||||
- **WHEN** user uploads a multi-page PDF file
|
||||
- **THEN** the system processes each page with PaddleOCR-VL
|
||||
- **AND** performs layout analysis to identify document elements (titles, paragraphs, tables, images, formulas)
|
||||
- **AND** returns Markdown organized by page with preserved reading order
|
||||
- **AND** provides JSON with detailed layout structure and bounding boxes
|
||||
|
||||
#### Scenario: Mixed language content
|
||||
- **WHEN** user uploads an image with both Chinese and English text
|
||||
- **THEN** the system detects and extracts text in both languages
|
||||
- **AND** preserves the spatial relationship between text regions
|
||||
- **AND** maintains proper reading order in output Markdown
|
||||
|
||||
#### Scenario: Complex document with tables and images
|
||||
- **WHEN** user uploads a scanned document containing tables, images, and text
|
||||
- **THEN** the system identifies layout elements (text blocks, tables, images, formulas)
|
||||
- **AND** extracts table structure as Markdown tables
|
||||
- **AND** extracts and saves document images as separate files
|
||||
- **AND** embeds image references in Markdown ()
|
||||
- **AND** preserves document hierarchy and reading order in Markdown output
|
||||
|
||||
### Requirement: Batch Processing
|
||||
The system SHALL process multiple files concurrently with progress tracking and error handling.
|
||||
|
||||
#### Scenario: Batch upload success
|
||||
- **WHEN** user uploads 10 image files simultaneously
|
||||
- **THEN** the system creates a batch task with unique batch ID
|
||||
- **AND** processes files in parallel (up to configured worker limit)
|
||||
- **AND** returns real-time progress updates via WebSocket or polling
|
||||
|
||||
#### Scenario: Batch processing with partial failure
|
||||
- **WHEN** a batch contains 5 valid images and 2 corrupted files
|
||||
- **THEN** the system processes all valid files successfully
|
||||
- **AND** logs errors for corrupted files with specific error messages
|
||||
- **AND** marks the batch as "partially completed"
|
||||
|
||||
### Requirement: Image Preprocessing
|
||||
The system SHALL provide optional image preprocessing to improve OCR accuracy.
|
||||
|
||||
#### Scenario: Low contrast image enhancement
|
||||
- **WHEN** user enables preprocessing for a low-contrast image
|
||||
- **THEN** the system applies contrast adjustment and denoising
|
||||
- **AND** performs OCR on the enhanced image
|
||||
- **AND** returns better accuracy compared to original
|
||||
|
||||
#### Scenario: Skipped preprocessing
|
||||
- **WHEN** user disables preprocessing option
|
||||
- **THEN** the system performs OCR directly on original image
|
||||
- **AND** completes processing faster
|
||||
|
||||
### Requirement: Confidence Threshold Filtering
|
||||
The system SHALL filter OCR results based on configurable confidence threshold.
|
||||
|
||||
#### Scenario: High confidence filter
|
||||
- **WHEN** user sets confidence threshold to 0.8
|
||||
- **THEN** the system returns only text segments with confidence >= 0.8
|
||||
- **AND** discards low-confidence results
|
||||
|
||||
#### Scenario: Include all results
|
||||
- **WHEN** user sets confidence threshold to 0.0
|
||||
- **THEN** the system returns all recognized text regardless of confidence
|
||||
- **AND** includes confidence scores in output
|
||||
|
||||
### Requirement: OCR Result Structure
|
||||
The system SHALL return OCR results in multiple formats (JSON, Markdown) with extracted text, images, and structure metadata.
|
||||
|
||||
#### Scenario: Successful OCR result with multiple formats
|
||||
- **WHEN** OCR processing completes successfully
|
||||
- **THEN** the system returns JSON containing:
|
||||
- File metadata (name, size, format, upload timestamp)
|
||||
- Detected text regions with bounding boxes (x, y, width, height)
|
||||
- Recognized text content for each region
|
||||
- Confidence scores (0.0 to 1.0)
|
||||
- Language detected
|
||||
- Layout element types (title, paragraph, table, image, formula)
|
||||
- Reading order sequence
|
||||
- List of extracted image files with paths
|
||||
- Processing time
|
||||
- Task status (completed/failed/partial)
|
||||
- **AND** generates Markdown file with logical structure
|
||||
- **AND** saves extracted images to storage directory
|
||||
- **AND** provides methods to export as searchable PDF with images
|
||||
|
||||
#### Scenario: Searchable PDF generation with images
|
||||
- **WHEN** user requests PDF export from OCR results
|
||||
- **THEN** the system converts Markdown to HTML with basic CSS styling
|
||||
- **AND** embeds extracted images in their logical positions (not exact original positions)
|
||||
- **AND** generates PDF using Pandoc + WeasyPrint
|
||||
- **AND** preserves document hierarchy, tables, and reading order
|
||||
- **AND** applies appropriate fonts for Chinese characters
|
||||
- **AND** produces searchable PDF (text is selectable and searchable)
|
||||
|
||||
### Requirement: Document Translation (Reserved Architecture)
|
||||
The system SHALL provide architecture and UI placeholders for future document translation features.
|
||||
|
||||
#### Scenario: Translation option visibility (UI placeholder)
|
||||
- **WHEN** user views OCR result page
|
||||
- **THEN** the system displays a "Translate Document" button (disabled or labeled "Coming Soon")
|
||||
- **AND** shows target language selection dropdown (disabled)
|
||||
- **AND** provides tooltip: "Translation feature will be available in future release"
|
||||
|
||||
#### Scenario: Translation API endpoint (reserved)
|
||||
- **WHEN** backend API is queried for translation endpoints
|
||||
- **THEN** the system provides `/api/v1/translate/document` endpoint specification
|
||||
- **AND** returns "Not Implemented" (501) status when called
|
||||
- **AND** documents expected request/response format for future implementation
|
||||
|
||||
#### Scenario: Translation configuration storage (database schema)
|
||||
- **WHEN** database schema is created
|
||||
- **THEN** the system includes `translation_configs` table
|
||||
- **AND** defines columns: id, user_id, source_lang, target_lang, engine_type, engine_config, created_at
|
||||
- **AND** table remains empty until translation feature is implemented
|
||||
@@ -0,0 +1,230 @@
|
||||
# Implementation Tasks
|
||||
|
||||
## Phase 1: Core OCR with Layout Preservation
|
||||
|
||||
### 1. Environment Setup
|
||||
- [x] 1.1 Create Conda environment with Python 3.10
|
||||
- [x] 1.2 Install backend dependencies (FastAPI, PaddleOCR 3.0+, paddlepaddle, pandas, etc.)
|
||||
- [x] 1.3 Install PDF generation tools (weasyprint, markdown, pandoc system package)
|
||||
- [x] 1.4 Download PaddleOCR-VL model (~900MB) and language packs
|
||||
- [ ] 1.5 Setup frontend project with Vite + React + TypeScript
|
||||
- [ ] 1.6 Install frontend dependencies (Tailwind, shadcn/ui, axios, react-query)
|
||||
- [x] 1.7 Configure MySQL database connection
|
||||
- [x] 1.8 Install Chinese fonts (Noto Sans CJK) for PDF generation
|
||||
|
||||
### 2. Database Schema
|
||||
- [x] 2.1 Create `paddle_ocr_users` table for JWT authentication (id, username, password_hash, etc.)
|
||||
- [x] 2.2 Create `paddle_ocr_batches` table (id, user_id, status, created_at, completed_at)
|
||||
- [x] 2.3 Create `paddle_ocr_files` table (id, batch_id, filename, file_path, file_size, status, format)
|
||||
- [x] 2.4 Create `paddle_ocr_results` table (id, file_id, markdown_path, json_path, layout_data, confidence)
|
||||
- [x] 2.5 Create `paddle_ocr_export_rules` table (id, user_id, rule_name, config_json, css_template)
|
||||
- [x] 2.6 Create `paddle_ocr_translation_configs` table (RESERVED: id, user_id, source_lang, target_lang, engine_type, engine_config)
|
||||
- [x] 2.7 Write database migration scripts (Alembic)
|
||||
- [x] 2.8 Add indexes for performance optimization (batch_id, user_id, status)
|
||||
- Note: All tables use `paddle_ocr_` prefix for namespace isolation
|
||||
|
||||
### 3. Backend - Document Preprocessing
|
||||
- [x] 3.1 Implement document preprocessor class for format standardization
|
||||
- [x] 3.2 Add image format validator (PNG, JPG, JPEG)
|
||||
- [x] 3.3 Add PDF validator and direct passthrough (PaddleOCR-VL native support)
|
||||
- [x] 3.4 Implement Office document to PDF conversion (DOC, DOCX, PPT, PPTX via LibreOffice) ⬅️ **Completed via sub-proposal**
|
||||
- [x] 3.5 Add file corruption detection
|
||||
- [x] 3.6 Write unit tests for preprocessor
|
||||
|
||||
### 4. Backend - Core OCR Service with PaddleOCR-VL
|
||||
- [x] 4.1 Implement OCR service class with PaddleOCR-VL initialization
|
||||
- [x] 4.2 Configure layout detection (use_layout_detection=True)
|
||||
- [x] 4.3 Implement single image/PDF OCR processing
|
||||
- [x] 4.4 Parse OCR output to extract Markdown and JSON
|
||||
- [x] 4.5 Store Markdown files with preserved layout structure
|
||||
- [x] 4.6 Store JSON with detailed bounding boxes and layout metadata
|
||||
- [x] 4.7 Add confidence threshold filtering
|
||||
- [x] 4.8 Implement batch processing with worker queue (completed via Task 10: BackgroundTasks)
|
||||
- [x] 4.9 Add progress tracking for batch jobs (completed via Task 8.4, 8.6: API endpoints)
|
||||
- [x] 4.10 Write unit tests for OCR service
|
||||
|
||||
### 5. Backend - Layout-Preserved PDF Generation
|
||||
- [x] 5.1 Create PDF generator service using Pandoc + WeasyPrint
|
||||
- [x] 5.2 Implement Markdown to HTML conversion with extensions (tables, code, etc.)
|
||||
- [x] 5.3 Create default CSS template for layout preservation
|
||||
- [x] 5.4 Create additional CSS templates (academic, business, report)
|
||||
- [x] 5.5 Add Chinese font configuration (Noto Sans CJK)
|
||||
- [x] 5.6 Implement PDF generation via Pandoc command
|
||||
- [x] 5.7 Add fallback: Python WeasyPrint direct generation
|
||||
- [x] 5.8 Handle multi-page PDF merging
|
||||
- [x] 5.9 Write unit tests for PDF generator
|
||||
|
||||
### 6. Backend - File Management
|
||||
- [x] 6.1 Implement file upload validation (type, size, corruption check)
|
||||
- [x] 6.2 Create file storage service with temporary directory management
|
||||
- [x] 6.3 Add batch upload handler with unique batch_id generation
|
||||
- [x] 6.4 Implement file access control and ownership verification
|
||||
- [x] 6.5 Add automatic cleanup job for expired files (24-hour retention)
|
||||
- [x] 6.6 Store Markdown and JSON outputs in organized directory structure
|
||||
- [x] 6.7 Write unit tests for file management
|
||||
|
||||
### 7. Backend - Export Service
|
||||
- [x] 7.1 Implement plain text export from Markdown
|
||||
- [x] 7.2 Implement JSON export with full metadata
|
||||
- [x] 7.3 Implement Excel export using pandas
|
||||
- [x] 7.4 Implement Markdown export (direct from OCR output)
|
||||
- [x] 7.5 Implement layout-preserved PDF export (using PDF generator service)
|
||||
- [x] 7.6 Add ZIP file creation for batch exports
|
||||
- [x] 7.7 Implement rule-based filtering (confidence threshold, filename pattern)
|
||||
- [x] 7.8 Implement rule-based formatting (line numbers, sort by position)
|
||||
- [x] 7.9 Create export rule CRUD operations (save, load, update, delete)
|
||||
- [x] 7.10 Write unit tests for export service
|
||||
|
||||
### 8. Backend - API Endpoints
|
||||
- [x] 8.1 POST `/api/v1/auth/login` - JWT authentication
|
||||
- [x] 8.2 POST `/api/v1/upload` - File upload with validation
|
||||
- [x] 8.3 POST `/api/v1/ocr/process` - Trigger OCR processing (PaddleOCR-VL)
|
||||
- [x] 8.4 GET `/api/v1/ocr/status/{task_id}` - Get task status with progress
|
||||
- [x] 8.5 GET `/api/v1/ocr/result/{task_id}` - Get OCR results (JSON + Markdown)
|
||||
- [x] 8.6 GET `/api/v1/batch/{batch_id}/status` - Get batch status
|
||||
- [x] 8.7 POST `/api/v1/export` - Export results with format and rules
|
||||
- [x] 8.8 GET `/api/v1/export/pdf/{file_id}` - Generate and download layout-preserved PDF
|
||||
- [x] 8.9 GET `/api/v1/export/rules` - List saved export rules
|
||||
- [x] 8.10 POST `/api/v1/export/rules` - Create new export rule
|
||||
- [x] 8.11 PUT `/api/v1/export/rules/{rule_id}` - Update export rule
|
||||
- [x] 8.12 DELETE `/api/v1/export/rules/{rule_id}` - Delete export rule
|
||||
- [x] 8.13 GET `/api/v1/export/css-templates` - List available CSS templates
|
||||
- [x] 8.14 Write API integration tests
|
||||
|
||||
### 9. Backend - Translation Architecture (RESERVED)
|
||||
- [x] 9.1 Create translation service interface (abstract class)
|
||||
- [x] 9.2 Implement stub endpoint POST `/api/v1/translate/document` (returns 501 Not Implemented)
|
||||
- [x] 9.3 Document expected request/response format in OpenAPI spec
|
||||
- [x] 9.4 Add translation_configs table migrations (completed in Task 2.6)
|
||||
- [x] 9.5 Create placeholder for translation engine factory (Argos/ERNIE/Google)
|
||||
- [ ] 9.6 Write unit tests for translation service interface (optional for stub)
|
||||
|
||||
### 10. Backend - Background Tasks
|
||||
- [x] 10.1 Implement FastAPI BackgroundTasks for async OCR processing
|
||||
- [ ] 10.2 Add task queue system (optional: Redis-based queue)
|
||||
- [x] 10.3 Implement progress updates (polling endpoint)
|
||||
- [x] 10.4 Add error handling and retry logic
|
||||
- [x] 10.5 Implement cleanup scheduler for expired files
|
||||
- [x] 10.6 Add PDF generation to background tasks (slower process)
|
||||
|
||||
## Phase 2: Frontend Development
|
||||
|
||||
### 11. Frontend - Project Structure
|
||||
- [x] 11.1 Setup Vite project with TypeScript support
|
||||
- [x] 11.2 Configure Tailwind CSS and shadcn/ui
|
||||
- [x] 11.3 Setup React Router for navigation
|
||||
- [x] 11.4 Configure Axios with base URL and interceptors
|
||||
- [x] 11.5 Setup React Query for API state management
|
||||
- [x] 11.6 Create Zustand store for global state
|
||||
- [x] 11.7 Setup i18n for Traditional Chinese interface
|
||||
|
||||
### 12. Frontend - UI Components (shadcn/ui)
|
||||
- [x] 12.1 Install and configure shadcn/ui components
|
||||
- [x] 12.2 Create FileUpload component with drag-and-drop (react-dropzone)
|
||||
- [x] 12.3 Create ProgressBar component for batch processing
|
||||
- [x] 12.4 Create ResultsTable component for displaying OCR results
|
||||
- [x] 12.5 Create MarkdownPreview component for viewing extracted content ⬅️ **Fixed: API schema alignment for filename display**
|
||||
- [ ] 12.6 Create ExportDialog component for format and rule selection
|
||||
- [ ] 12.7 Create CSSTemplateSelector component for PDF styling
|
||||
- [ ] 12.8 Create RuleEditor component for creating custom rules
|
||||
- [x] 12.9 Create Toast notifications for feedback
|
||||
- [ ] 12.10 Create TranslationPanel component (DISABLED with "Coming Soon" label)
|
||||
|
||||
### 13. Frontend - Pages
|
||||
- [x] 13.1 Create Login page with JWT authentication
|
||||
- [x] 13.2 Create Upload page with file selection and batch management ⬅️ **Fixed: Upload response schema alignment**
|
||||
- [x] 13.3 Create Processing page with real-time progress ⬅️ **Fixed: Error field mapping**
|
||||
- [x] 13.4 Create Results page with Markdown/JSON preview ⬅️ **Fixed: OCR result detail flattening, null safety**
|
||||
- [x] 13.5 Create Export page with format options (TXT, JSON, Excel, Markdown, PDF)
|
||||
- [ ] 13.6 Create PDF Preview page (optional: embedded PDF viewer)
|
||||
- [x] 13.7 Create Settings page for export rule management
|
||||
- [x] 13.8 Add translation option placeholder in Results page (disabled state)
|
||||
|
||||
### 14. Frontend - API Integration
|
||||
- [x] 14.1 Create API client service with typed interfaces ⬅️ **Updated: All endpoints verified working**
|
||||
- [x] 14.2 Implement file upload with progress tracking ⬅️ **Fixed: UploadBatchResponse schema**
|
||||
- [x] 14.3 Implement OCR task status polling ⬅️ **Fixed: BatchStatusResponse with files array**
|
||||
- [x] 14.4 Implement results fetching (Markdown + JSON display) ⬅️ **Fixed: OCRResultDetailResponse with flattened structure**
|
||||
- [x] 14.5 Implement export with file download ⬅️ **Fixed: ExportOptions schema added**
|
||||
- [x] 14.6 Implement PDF generation request with loading indicator
|
||||
- [x] 14.7 Implement rule CRUD operations
|
||||
- [x] 14.8 Implement CSS template selection ⬅️ **Fixed: CSSTemplateResponse with filename field**
|
||||
- [x] 14.9 Add error handling and user feedback ⬅️ **Fixed: Error field mapping with validation_alias**
|
||||
- [x] 14.10 Create translation API client (stub, for future use)
|
||||
|
||||
## Phase 3: Testing & Optimization
|
||||
|
||||
### 15. Testing
|
||||
- [ ] 15.1 Write backend unit tests (pytest) for all services
|
||||
- [ ] 15.2 Write backend API integration tests
|
||||
- [ ] 15.3 Test PaddleOCR-VL with various document types (scanned images, PDFs, mixed content)
|
||||
- [ ] 15.4 Test layout preservation quality (Markdown structure correctness)
|
||||
- [ ] 15.5 Test PDF generation with different CSS templates
|
||||
- [ ] 15.6 Test Chinese font rendering in generated PDFs
|
||||
- [ ] 15.7 Write frontend component tests (Vitest)
|
||||
- [ ] 15.8 Perform manual end-to-end testing
|
||||
- [ ] 15.9 Test with various image formats and languages
|
||||
- [ ] 15.10 Test batch processing with large file sets (50+ files)
|
||||
- [ ] 15.11 Test export with different formats and rules
|
||||
- [x] 15.12 Verify translation UI placeholders are properly disabled
|
||||
|
||||
### 16. Documentation
|
||||
- [ ] 16.1 Write API documentation (FastAPI auto-docs + additional notes)
|
||||
- [ ] 16.2 Document PaddleOCR-VL model requirements and installation
|
||||
- [ ] 16.3 Document Pandoc and WeasyPrint setup
|
||||
- [ ] 16.4 Create CSS template customization guide
|
||||
- [ ] 16.5 Write user guide for web interface
|
||||
- [ ] 16.6 Write deployment guide for 1Panel
|
||||
- [ ] 16.7 Create README.md with setup instructions
|
||||
- [ ] 16.8 Document export rule syntax and examples
|
||||
- [ ] 16.9 Document translation feature roadmap and architecture
|
||||
|
||||
## Phase 4: Deployment
|
||||
|
||||
### 17. Deployment Preparation
|
||||
- [ ] 17.1 Create backend startup script (start.sh)
|
||||
- [ ] 17.2 Create frontend build script (build.sh)
|
||||
- [ ] 17.3 Create Nginx configuration file (static files + reverse proxy)
|
||||
- [ ] 17.4 Create Supervisor configuration for backend process
|
||||
- [ ] 17.5 Create environment variable templates (.env.example)
|
||||
- [ ] 17.6 Create deployment automation script (deploy.sh)
|
||||
- [ ] 17.7 Prepare CSS templates for production
|
||||
- [ ] 17.8 Test deployment on staging environment
|
||||
|
||||
### 18. Production Deployment (1Panel)
|
||||
- [ ] 18.1 Setup Conda environment on production server
|
||||
- [ ] 18.2 Install system dependencies (pandoc, fonts-noto-cjk)
|
||||
- [ ] 18.3 Install Python dependencies and download PaddleOCR-VL models
|
||||
- [ ] 18.4 Configure MySQL database connection
|
||||
- [ ] 18.5 Build frontend static files
|
||||
- [ ] 18.6 Configure Nginx via 1Panel (static files + reverse proxy)
|
||||
- [ ] 18.7 Setup Supervisor to manage backend process
|
||||
- [ ] 18.8 Configure SSL certificate (Let's Encrypt via 1Panel)
|
||||
- [ ] 18.9 Perform production smoke tests (upload, OCR, export PDF)
|
||||
- [ ] 18.10 Setup monitoring and logging
|
||||
- [ ] 18.11 Verify PDF generation works in production environment
|
||||
|
||||
## Phase 5: Translation Feature (FUTURE)
|
||||
|
||||
### 19. Translation Implementation (Post-Launch)
|
||||
- [ ] 19.1 Decide on translation engine (Argos offline vs ERNIE API vs Google API)
|
||||
- [ ] 19.2 Implement chosen translation engine integration
|
||||
- [ ] 19.3 Implement Markdown translation with structure preservation
|
||||
- [ ] 19.4 Update POST `/api/v1/translate/document` endpoint (remove 501 status)
|
||||
- [ ] 19.5 Add translation configuration UI (enable TranslationPanel component)
|
||||
- [ ] 19.6 Add source/target language selection
|
||||
- [ ] 19.7 Implement translation progress tracking
|
||||
- [ ] 19.8 Test translation with various document types
|
||||
- [ ] 19.9 Optimize translation quality for technical documents
|
||||
- [ ] 19.10 Update documentation with translation feature guide
|
||||
|
||||
## Summary
|
||||
|
||||
**Phase 1 (Core OCR + Layout Preservation)**: Tasks 1-10 (基礎 OCR + 版面保留 PDF)
|
||||
**Phase 2 (Frontend)**: Tasks 11-14 (用戶界面)
|
||||
**Phase 3 (Testing)**: Tasks 15-16 (測試與文檔)
|
||||
**Phase 4 (Deployment)**: Tasks 17-18 (部署)
|
||||
**Phase 5 (Translation)**: Task 19 (翻譯功能 - 未來實現)
|
||||
|
||||
**Total Tasks**: 150+ tasks
|
||||
**Priority**: Complete Phase 1-4 first, Phase 5 after production deployment and user feedback
|
||||
Reference in New Issue
Block a user