# Implementation Tasks ## Phase 1: Core OCR with Layout Preservation ### 1. Environment Setup - [x] 1.1 Create Conda environment with Python 3.10 - [x] 1.2 Install backend dependencies (FastAPI, PaddleOCR 3.0+, paddlepaddle, pandas, etc.) - [x] 1.3 Install PDF generation tools (weasyprint, markdown, pandoc system package) - [x] 1.4 Download PaddleOCR-VL model (~900MB) and language packs - [ ] 1.5 Setup frontend project with Vite + React + TypeScript - [ ] 1.6 Install frontend dependencies (Tailwind, shadcn/ui, axios, react-query) - [x] 1.7 Configure MySQL database connection - [x] 1.8 Install Chinese fonts (Noto Sans CJK) for PDF generation ### 2. Database Schema - [x] 2.1 Create `paddle_ocr_users` table for JWT authentication (id, username, password_hash, etc.) - [x] 2.2 Create `paddle_ocr_batches` table (id, user_id, status, created_at, completed_at) - [x] 2.3 Create `paddle_ocr_files` table (id, batch_id, filename, file_path, file_size, status, format) - [x] 2.4 Create `paddle_ocr_results` table (id, file_id, markdown_path, json_path, layout_data, confidence) - [x] 2.5 Create `paddle_ocr_export_rules` table (id, user_id, rule_name, config_json, css_template) - [x] 2.6 Create `paddle_ocr_translation_configs` table (RESERVED: id, user_id, source_lang, target_lang, engine_type, engine_config) - [x] 2.7 Write database migration scripts (Alembic) - [x] 2.8 Add indexes for performance optimization (batch_id, user_id, status) - Note: All tables use `paddle_ocr_` prefix for namespace isolation ### 3. Backend - Document Preprocessing - [x] 3.1 Implement document preprocessor class for format standardization - [x] 3.2 Add image format validator (PNG, JPG, JPEG) - [x] 3.3 Add PDF validator and direct passthrough (PaddleOCR-VL native support) - [x] 3.4 Implement Office document to PDF conversion (DOC, DOCX, PPT, PPTX via LibreOffice) ⬅️ **Completed via sub-proposal** - [x] 3.5 Add file corruption detection - [x] 3.6 Write unit tests for preprocessor ### 4. Backend - Core OCR Service with PaddleOCR-VL - [x] 4.1 Implement OCR service class with PaddleOCR-VL initialization - [x] 4.2 Configure layout detection (use_layout_detection=True) - [x] 4.3 Implement single image/PDF OCR processing - [x] 4.4 Parse OCR output to extract Markdown and JSON - [x] 4.5 Store Markdown files with preserved layout structure - [x] 4.6 Store JSON with detailed bounding boxes and layout metadata - [x] 4.7 Add confidence threshold filtering - [x] 4.8 Implement batch processing with worker queue (completed via Task 10: BackgroundTasks) - [x] 4.9 Add progress tracking for batch jobs (completed via Task 8.4, 8.6: API endpoints) - [x] 4.10 Write unit tests for OCR service ### 5. Backend - Layout-Preserved PDF Generation - [x] 5.1 Create PDF generator service using Pandoc + WeasyPrint - [x] 5.2 Implement Markdown to HTML conversion with extensions (tables, code, etc.) - [x] 5.3 Create default CSS template for layout preservation - [x] 5.4 Create additional CSS templates (academic, business, report) - [x] 5.5 Add Chinese font configuration (Noto Sans CJK) - [x] 5.6 Implement PDF generation via Pandoc command - [x] 5.7 Add fallback: Python WeasyPrint direct generation - [x] 5.8 Handle multi-page PDF merging - [x] 5.9 Write unit tests for PDF generator ### 6. Backend - File Management - [x] 6.1 Implement file upload validation (type, size, corruption check) - [x] 6.2 Create file storage service with temporary directory management - [x] 6.3 Add batch upload handler with unique batch_id generation - [x] 6.4 Implement file access control and ownership verification - [x] 6.5 Add automatic cleanup job for expired files (24-hour retention) - [x] 6.6 Store Markdown and JSON outputs in organized directory structure - [x] 6.7 Write unit tests for file management ### 7. Backend - Export Service - [x] 7.1 Implement plain text export from Markdown - [x] 7.2 Implement JSON export with full metadata - [x] 7.3 Implement Excel export using pandas - [x] 7.4 Implement Markdown export (direct from OCR output) - [x] 7.5 Implement layout-preserved PDF export (using PDF generator service) - [x] 7.6 Add ZIP file creation for batch exports - [x] 7.7 Implement rule-based filtering (confidence threshold, filename pattern) - [x] 7.8 Implement rule-based formatting (line numbers, sort by position) - [x] 7.9 Create export rule CRUD operations (save, load, update, delete) - [x] 7.10 Write unit tests for export service ### 8. Backend - API Endpoints - [x] 8.1 POST `/api/v1/auth/login` - JWT authentication - [x] 8.2 POST `/api/v1/upload` - File upload with validation - [x] 8.3 POST `/api/v1/ocr/process` - Trigger OCR processing (PaddleOCR-VL) - [x] 8.4 GET `/api/v1/ocr/status/{task_id}` - Get task status with progress - [x] 8.5 GET `/api/v1/ocr/result/{task_id}` - Get OCR results (JSON + Markdown) - [x] 8.6 GET `/api/v1/batch/{batch_id}/status` - Get batch status - [x] 8.7 POST `/api/v1/export` - Export results with format and rules - [x] 8.8 GET `/api/v1/export/pdf/{file_id}` - Generate and download layout-preserved PDF - [x] 8.9 GET `/api/v1/export/rules` - List saved export rules - [x] 8.10 POST `/api/v1/export/rules` - Create new export rule - [x] 8.11 PUT `/api/v1/export/rules/{rule_id}` - Update export rule - [x] 8.12 DELETE `/api/v1/export/rules/{rule_id}` - Delete export rule - [x] 8.13 GET `/api/v1/export/css-templates` - List available CSS templates - [x] 8.14 Write API integration tests ### 9. Backend - Translation Architecture (RESERVED) - [x] 9.1 Create translation service interface (abstract class) - [x] 9.2 Implement stub endpoint POST `/api/v1/translate/document` (returns 501 Not Implemented) - [x] 9.3 Document expected request/response format in OpenAPI spec - [x] 9.4 Add translation_configs table migrations (completed in Task 2.6) - [x] 9.5 Create placeholder for translation engine factory (Argos/ERNIE/Google) - [ ] 9.6 Write unit tests for translation service interface (optional for stub) ### 10. Backend - Background Tasks - [x] 10.1 Implement FastAPI BackgroundTasks for async OCR processing - [ ] 10.2 Add task queue system (optional: Redis-based queue) - [x] 10.3 Implement progress updates (polling endpoint) - [x] 10.4 Add error handling and retry logic - [x] 10.5 Implement cleanup scheduler for expired files - [x] 10.6 Add PDF generation to background tasks (slower process) ## Phase 2: Frontend Development ### 11. Frontend - Project Structure - [x] 11.1 Setup Vite project with TypeScript support - [x] 11.2 Configure Tailwind CSS and shadcn/ui - [x] 11.3 Setup React Router for navigation - [x] 11.4 Configure Axios with base URL and interceptors - [x] 11.5 Setup React Query for API state management - [x] 11.6 Create Zustand store for global state - [x] 11.7 Setup i18n for Traditional Chinese interface ### 12. Frontend - UI Components (shadcn/ui) - [x] 12.1 Install and configure shadcn/ui components - [x] 12.2 Create FileUpload component with drag-and-drop (react-dropzone) - [x] 12.3 Create ProgressBar component for batch processing - [x] 12.4 Create ResultsTable component for displaying OCR results - [x] 12.5 Create MarkdownPreview component for viewing extracted content ⬅️ **Fixed: API schema alignment for filename display** - [ ] 12.6 Create ExportDialog component for format and rule selection - [ ] 12.7 Create CSSTemplateSelector component for PDF styling - [ ] 12.8 Create RuleEditor component for creating custom rules - [x] 12.9 Create Toast notifications for feedback - [ ] 12.10 Create TranslationPanel component (DISABLED with "Coming Soon" label) ### 13. Frontend - Pages - [x] 13.1 Create Login page with JWT authentication - [x] 13.2 Create Upload page with file selection and batch management ⬅️ **Fixed: Upload response schema alignment** - [x] 13.3 Create Processing page with real-time progress ⬅️ **Fixed: Error field mapping** - [x] 13.4 Create Results page with Markdown/JSON preview ⬅️ **Fixed: OCR result detail flattening, null safety** - [x] 13.5 Create Export page with format options (TXT, JSON, Excel, Markdown, PDF) - [ ] 13.6 Create PDF Preview page (optional: embedded PDF viewer) - [x] 13.7 Create Settings page for export rule management - [x] 13.8 Add translation option placeholder in Results page (disabled state) ### 14. Frontend - API Integration - [x] 14.1 Create API client service with typed interfaces ⬅️ **Updated: All endpoints verified working** - [x] 14.2 Implement file upload with progress tracking ⬅️ **Fixed: UploadBatchResponse schema** - [x] 14.3 Implement OCR task status polling ⬅️ **Fixed: BatchStatusResponse with files array** - [x] 14.4 Implement results fetching (Markdown + JSON display) ⬅️ **Fixed: OCRResultDetailResponse with flattened structure** - [x] 14.5 Implement export with file download ⬅️ **Fixed: ExportOptions schema added** - [x] 14.6 Implement PDF generation request with loading indicator - [x] 14.7 Implement rule CRUD operations - [x] 14.8 Implement CSS template selection ⬅️ **Fixed: CSSTemplateResponse with filename field** - [x] 14.9 Add error handling and user feedback ⬅️ **Fixed: Error field mapping with validation_alias** - [x] 14.10 Create translation API client (stub, for future use) ## Phase 3: Testing & Optimization ### 15. Testing - [ ] 15.1 Write backend unit tests (pytest) for all services - [ ] 15.2 Write backend API integration tests - [ ] 15.3 Test PaddleOCR-VL with various document types (scanned images, PDFs, mixed content) - [ ] 15.4 Test layout preservation quality (Markdown structure correctness) - [ ] 15.5 Test PDF generation with different CSS templates - [ ] 15.6 Test Chinese font rendering in generated PDFs - [ ] 15.7 Write frontend component tests (Vitest) - [ ] 15.8 Perform manual end-to-end testing - [ ] 15.9 Test with various image formats and languages - [ ] 15.10 Test batch processing with large file sets (50+ files) - [ ] 15.11 Test export with different formats and rules - [x] 15.12 Verify translation UI placeholders are properly disabled ### 16. Documentation - [ ] 16.1 Write API documentation (FastAPI auto-docs + additional notes) - [ ] 16.2 Document PaddleOCR-VL model requirements and installation - [ ] 16.3 Document Pandoc and WeasyPrint setup - [ ] 16.4 Create CSS template customization guide - [ ] 16.5 Write user guide for web interface - [ ] 16.6 Write deployment guide for 1Panel - [ ] 16.7 Create README.md with setup instructions - [ ] 16.8 Document export rule syntax and examples - [ ] 16.9 Document translation feature roadmap and architecture ## Phase 4: Deployment ### 17. Deployment Preparation - [ ] 17.1 Create backend startup script (start.sh) - [ ] 17.2 Create frontend build script (build.sh) - [ ] 17.3 Create Nginx configuration file (static files + reverse proxy) - [ ] 17.4 Create Supervisor configuration for backend process - [ ] 17.5 Create environment variable templates (.env.example) - [ ] 17.6 Create deployment automation script (deploy.sh) - [ ] 17.7 Prepare CSS templates for production - [ ] 17.8 Test deployment on staging environment ### 18. Production Deployment (1Panel) - [ ] 18.1 Setup Conda environment on production server - [ ] 18.2 Install system dependencies (pandoc, fonts-noto-cjk) - [ ] 18.3 Install Python dependencies and download PaddleOCR-VL models - [ ] 18.4 Configure MySQL database connection - [ ] 18.5 Build frontend static files - [ ] 18.6 Configure Nginx via 1Panel (static files + reverse proxy) - [ ] 18.7 Setup Supervisor to manage backend process - [ ] 18.8 Configure SSL certificate (Let's Encrypt via 1Panel) - [ ] 18.9 Perform production smoke tests (upload, OCR, export PDF) - [ ] 18.10 Setup monitoring and logging - [ ] 18.11 Verify PDF generation works in production environment ## Phase 5: Translation Feature (FUTURE) ### 19. Translation Implementation (Post-Launch) - [ ] 19.1 Decide on translation engine (Argos offline vs ERNIE API vs Google API) - [ ] 19.2 Implement chosen translation engine integration - [ ] 19.3 Implement Markdown translation with structure preservation - [ ] 19.4 Update POST `/api/v1/translate/document` endpoint (remove 501 status) - [ ] 19.5 Add translation configuration UI (enable TranslationPanel component) - [ ] 19.6 Add source/target language selection - [ ] 19.7 Implement translation progress tracking - [ ] 19.8 Test translation with various document types - [ ] 19.9 Optimize translation quality for technical documents - [ ] 19.10 Update documentation with translation feature guide ## Summary **Phase 1 (Core OCR + Layout Preservation)**: Tasks 1-10 (基礎 OCR + 版面保留 PDF) **Phase 2 (Frontend)**: Tasks 11-14 (用戶界面) **Phase 3 (Testing)**: Tasks 15-16 (測試與文檔) **Phase 4 (Deployment)**: Tasks 17-18 (部署) **Phase 5 (Translation)**: Task 19 (翻譯功能 - 未來實現) **Total Tasks**: 150+ tasks **Priority**: Complete Phase 1-4 first, Phase 5 after production deployment and user feedback