- Removed all test files and directories - Deleted outdated documentation (will be rewritten) - Cleaned up temporary files, logs, and uploads - Archived 5 completed OpenSpec proposals - Created new dual-track-document-processing proposal with complete OpenSpec structure - Dual-track architecture: OCR track (PaddleOCR) + Direct track (PyMuPDF) - UnifiedDocument model for consistent output - Support for structure-preserving translation - Updated .gitignore to prevent future test/temp files This is a major cleanup preparing for the complete refactoring of the document processing pipeline. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
231 lines
13 KiB
Markdown
231 lines
13 KiB
Markdown
# Implementation Tasks
|
|
|
|
## Phase 1: Core OCR with Layout Preservation
|
|
|
|
### 1. Environment Setup
|
|
- [x] 1.1 Create Conda environment with Python 3.10
|
|
- [x] 1.2 Install backend dependencies (FastAPI, PaddleOCR 3.0+, paddlepaddle, pandas, etc.)
|
|
- [x] 1.3 Install PDF generation tools (weasyprint, markdown, pandoc system package)
|
|
- [x] 1.4 Download PaddleOCR-VL model (~900MB) and language packs
|
|
- [ ] 1.5 Setup frontend project with Vite + React + TypeScript
|
|
- [ ] 1.6 Install frontend dependencies (Tailwind, shadcn/ui, axios, react-query)
|
|
- [x] 1.7 Configure MySQL database connection
|
|
- [x] 1.8 Install Chinese fonts (Noto Sans CJK) for PDF generation
|
|
|
|
### 2. Database Schema
|
|
- [x] 2.1 Create `paddle_ocr_users` table for JWT authentication (id, username, password_hash, etc.)
|
|
- [x] 2.2 Create `paddle_ocr_batches` table (id, user_id, status, created_at, completed_at)
|
|
- [x] 2.3 Create `paddle_ocr_files` table (id, batch_id, filename, file_path, file_size, status, format)
|
|
- [x] 2.4 Create `paddle_ocr_results` table (id, file_id, markdown_path, json_path, layout_data, confidence)
|
|
- [x] 2.5 Create `paddle_ocr_export_rules` table (id, user_id, rule_name, config_json, css_template)
|
|
- [x] 2.6 Create `paddle_ocr_translation_configs` table (RESERVED: id, user_id, source_lang, target_lang, engine_type, engine_config)
|
|
- [x] 2.7 Write database migration scripts (Alembic)
|
|
- [x] 2.8 Add indexes for performance optimization (batch_id, user_id, status)
|
|
- Note: All tables use `paddle_ocr_` prefix for namespace isolation
|
|
|
|
### 3. Backend - Document Preprocessing
|
|
- [x] 3.1 Implement document preprocessor class for format standardization
|
|
- [x] 3.2 Add image format validator (PNG, JPG, JPEG)
|
|
- [x] 3.3 Add PDF validator and direct passthrough (PaddleOCR-VL native support)
|
|
- [x] 3.4 Implement Office document to PDF conversion (DOC, DOCX, PPT, PPTX via LibreOffice) ⬅️ **Completed via sub-proposal**
|
|
- [x] 3.5 Add file corruption detection
|
|
- [x] 3.6 Write unit tests for preprocessor
|
|
|
|
### 4. Backend - Core OCR Service with PaddleOCR-VL
|
|
- [x] 4.1 Implement OCR service class with PaddleOCR-VL initialization
|
|
- [x] 4.2 Configure layout detection (use_layout_detection=True)
|
|
- [x] 4.3 Implement single image/PDF OCR processing
|
|
- [x] 4.4 Parse OCR output to extract Markdown and JSON
|
|
- [x] 4.5 Store Markdown files with preserved layout structure
|
|
- [x] 4.6 Store JSON with detailed bounding boxes and layout metadata
|
|
- [x] 4.7 Add confidence threshold filtering
|
|
- [x] 4.8 Implement batch processing with worker queue (completed via Task 10: BackgroundTasks)
|
|
- [x] 4.9 Add progress tracking for batch jobs (completed via Task 8.4, 8.6: API endpoints)
|
|
- [x] 4.10 Write unit tests for OCR service
|
|
|
|
### 5. Backend - Layout-Preserved PDF Generation
|
|
- [x] 5.1 Create PDF generator service using Pandoc + WeasyPrint
|
|
- [x] 5.2 Implement Markdown to HTML conversion with extensions (tables, code, etc.)
|
|
- [x] 5.3 Create default CSS template for layout preservation
|
|
- [x] 5.4 Create additional CSS templates (academic, business, report)
|
|
- [x] 5.5 Add Chinese font configuration (Noto Sans CJK)
|
|
- [x] 5.6 Implement PDF generation via Pandoc command
|
|
- [x] 5.7 Add fallback: Python WeasyPrint direct generation
|
|
- [x] 5.8 Handle multi-page PDF merging
|
|
- [x] 5.9 Write unit tests for PDF generator
|
|
|
|
### 6. Backend - File Management
|
|
- [x] 6.1 Implement file upload validation (type, size, corruption check)
|
|
- [x] 6.2 Create file storage service with temporary directory management
|
|
- [x] 6.3 Add batch upload handler with unique batch_id generation
|
|
- [x] 6.4 Implement file access control and ownership verification
|
|
- [x] 6.5 Add automatic cleanup job for expired files (24-hour retention)
|
|
- [x] 6.6 Store Markdown and JSON outputs in organized directory structure
|
|
- [x] 6.7 Write unit tests for file management
|
|
|
|
### 7. Backend - Export Service
|
|
- [x] 7.1 Implement plain text export from Markdown
|
|
- [x] 7.2 Implement JSON export with full metadata
|
|
- [x] 7.3 Implement Excel export using pandas
|
|
- [x] 7.4 Implement Markdown export (direct from OCR output)
|
|
- [x] 7.5 Implement layout-preserved PDF export (using PDF generator service)
|
|
- [x] 7.6 Add ZIP file creation for batch exports
|
|
- [x] 7.7 Implement rule-based filtering (confidence threshold, filename pattern)
|
|
- [x] 7.8 Implement rule-based formatting (line numbers, sort by position)
|
|
- [x] 7.9 Create export rule CRUD operations (save, load, update, delete)
|
|
- [x] 7.10 Write unit tests for export service
|
|
|
|
### 8. Backend - API Endpoints
|
|
- [x] 8.1 POST `/api/v1/auth/login` - JWT authentication
|
|
- [x] 8.2 POST `/api/v1/upload` - File upload with validation
|
|
- [x] 8.3 POST `/api/v1/ocr/process` - Trigger OCR processing (PaddleOCR-VL)
|
|
- [x] 8.4 GET `/api/v1/ocr/status/{task_id}` - Get task status with progress
|
|
- [x] 8.5 GET `/api/v1/ocr/result/{task_id}` - Get OCR results (JSON + Markdown)
|
|
- [x] 8.6 GET `/api/v1/batch/{batch_id}/status` - Get batch status
|
|
- [x] 8.7 POST `/api/v1/export` - Export results with format and rules
|
|
- [x] 8.8 GET `/api/v1/export/pdf/{file_id}` - Generate and download layout-preserved PDF
|
|
- [x] 8.9 GET `/api/v1/export/rules` - List saved export rules
|
|
- [x] 8.10 POST `/api/v1/export/rules` - Create new export rule
|
|
- [x] 8.11 PUT `/api/v1/export/rules/{rule_id}` - Update export rule
|
|
- [x] 8.12 DELETE `/api/v1/export/rules/{rule_id}` - Delete export rule
|
|
- [x] 8.13 GET `/api/v1/export/css-templates` - List available CSS templates
|
|
- [x] 8.14 Write API integration tests
|
|
|
|
### 9. Backend - Translation Architecture (RESERVED)
|
|
- [x] 9.1 Create translation service interface (abstract class)
|
|
- [x] 9.2 Implement stub endpoint POST `/api/v1/translate/document` (returns 501 Not Implemented)
|
|
- [x] 9.3 Document expected request/response format in OpenAPI spec
|
|
- [x] 9.4 Add translation_configs table migrations (completed in Task 2.6)
|
|
- [x] 9.5 Create placeholder for translation engine factory (Argos/ERNIE/Google)
|
|
- [ ] 9.6 Write unit tests for translation service interface (optional for stub)
|
|
|
|
### 10. Backend - Background Tasks
|
|
- [x] 10.1 Implement FastAPI BackgroundTasks for async OCR processing
|
|
- [ ] 10.2 Add task queue system (optional: Redis-based queue)
|
|
- [x] 10.3 Implement progress updates (polling endpoint)
|
|
- [x] 10.4 Add error handling and retry logic
|
|
- [x] 10.5 Implement cleanup scheduler for expired files
|
|
- [x] 10.6 Add PDF generation to background tasks (slower process)
|
|
|
|
## Phase 2: Frontend Development
|
|
|
|
### 11. Frontend - Project Structure
|
|
- [x] 11.1 Setup Vite project with TypeScript support
|
|
- [x] 11.2 Configure Tailwind CSS and shadcn/ui
|
|
- [x] 11.3 Setup React Router for navigation
|
|
- [x] 11.4 Configure Axios with base URL and interceptors
|
|
- [x] 11.5 Setup React Query for API state management
|
|
- [x] 11.6 Create Zustand store for global state
|
|
- [x] 11.7 Setup i18n for Traditional Chinese interface
|
|
|
|
### 12. Frontend - UI Components (shadcn/ui)
|
|
- [x] 12.1 Install and configure shadcn/ui components
|
|
- [x] 12.2 Create FileUpload component with drag-and-drop (react-dropzone)
|
|
- [x] 12.3 Create ProgressBar component for batch processing
|
|
- [x] 12.4 Create ResultsTable component for displaying OCR results
|
|
- [x] 12.5 Create MarkdownPreview component for viewing extracted content ⬅️ **Fixed: API schema alignment for filename display**
|
|
- [ ] 12.6 Create ExportDialog component for format and rule selection
|
|
- [ ] 12.7 Create CSSTemplateSelector component for PDF styling
|
|
- [ ] 12.8 Create RuleEditor component for creating custom rules
|
|
- [x] 12.9 Create Toast notifications for feedback
|
|
- [ ] 12.10 Create TranslationPanel component (DISABLED with "Coming Soon" label)
|
|
|
|
### 13. Frontend - Pages
|
|
- [x] 13.1 Create Login page with JWT authentication
|
|
- [x] 13.2 Create Upload page with file selection and batch management ⬅️ **Fixed: Upload response schema alignment**
|
|
- [x] 13.3 Create Processing page with real-time progress ⬅️ **Fixed: Error field mapping**
|
|
- [x] 13.4 Create Results page with Markdown/JSON preview ⬅️ **Fixed: OCR result detail flattening, null safety**
|
|
- [x] 13.5 Create Export page with format options (TXT, JSON, Excel, Markdown, PDF)
|
|
- [ ] 13.6 Create PDF Preview page (optional: embedded PDF viewer)
|
|
- [x] 13.7 Create Settings page for export rule management
|
|
- [x] 13.8 Add translation option placeholder in Results page (disabled state)
|
|
|
|
### 14. Frontend - API Integration
|
|
- [x] 14.1 Create API client service with typed interfaces ⬅️ **Updated: All endpoints verified working**
|
|
- [x] 14.2 Implement file upload with progress tracking ⬅️ **Fixed: UploadBatchResponse schema**
|
|
- [x] 14.3 Implement OCR task status polling ⬅️ **Fixed: BatchStatusResponse with files array**
|
|
- [x] 14.4 Implement results fetching (Markdown + JSON display) ⬅️ **Fixed: OCRResultDetailResponse with flattened structure**
|
|
- [x] 14.5 Implement export with file download ⬅️ **Fixed: ExportOptions schema added**
|
|
- [x] 14.6 Implement PDF generation request with loading indicator
|
|
- [x] 14.7 Implement rule CRUD operations
|
|
- [x] 14.8 Implement CSS template selection ⬅️ **Fixed: CSSTemplateResponse with filename field**
|
|
- [x] 14.9 Add error handling and user feedback ⬅️ **Fixed: Error field mapping with validation_alias**
|
|
- [x] 14.10 Create translation API client (stub, for future use)
|
|
|
|
## Phase 3: Testing & Optimization
|
|
|
|
### 15. Testing
|
|
- [ ] 15.1 Write backend unit tests (pytest) for all services
|
|
- [ ] 15.2 Write backend API integration tests
|
|
- [ ] 15.3 Test PaddleOCR-VL with various document types (scanned images, PDFs, mixed content)
|
|
- [ ] 15.4 Test layout preservation quality (Markdown structure correctness)
|
|
- [ ] 15.5 Test PDF generation with different CSS templates
|
|
- [ ] 15.6 Test Chinese font rendering in generated PDFs
|
|
- [ ] 15.7 Write frontend component tests (Vitest)
|
|
- [ ] 15.8 Perform manual end-to-end testing
|
|
- [ ] 15.9 Test with various image formats and languages
|
|
- [ ] 15.10 Test batch processing with large file sets (50+ files)
|
|
- [ ] 15.11 Test export with different formats and rules
|
|
- [x] 15.12 Verify translation UI placeholders are properly disabled
|
|
|
|
### 16. Documentation
|
|
- [ ] 16.1 Write API documentation (FastAPI auto-docs + additional notes)
|
|
- [ ] 16.2 Document PaddleOCR-VL model requirements and installation
|
|
- [ ] 16.3 Document Pandoc and WeasyPrint setup
|
|
- [ ] 16.4 Create CSS template customization guide
|
|
- [ ] 16.5 Write user guide for web interface
|
|
- [ ] 16.6 Write deployment guide for 1Panel
|
|
- [ ] 16.7 Create README.md with setup instructions
|
|
- [ ] 16.8 Document export rule syntax and examples
|
|
- [ ] 16.9 Document translation feature roadmap and architecture
|
|
|
|
## Phase 4: Deployment
|
|
|
|
### 17. Deployment Preparation
|
|
- [ ] 17.1 Create backend startup script (start.sh)
|
|
- [ ] 17.2 Create frontend build script (build.sh)
|
|
- [ ] 17.3 Create Nginx configuration file (static files + reverse proxy)
|
|
- [ ] 17.4 Create Supervisor configuration for backend process
|
|
- [ ] 17.5 Create environment variable templates (.env.example)
|
|
- [ ] 17.6 Create deployment automation script (deploy.sh)
|
|
- [ ] 17.7 Prepare CSS templates for production
|
|
- [ ] 17.8 Test deployment on staging environment
|
|
|
|
### 18. Production Deployment (1Panel)
|
|
- [ ] 18.1 Setup Conda environment on production server
|
|
- [ ] 18.2 Install system dependencies (pandoc, fonts-noto-cjk)
|
|
- [ ] 18.3 Install Python dependencies and download PaddleOCR-VL models
|
|
- [ ] 18.4 Configure MySQL database connection
|
|
- [ ] 18.5 Build frontend static files
|
|
- [ ] 18.6 Configure Nginx via 1Panel (static files + reverse proxy)
|
|
- [ ] 18.7 Setup Supervisor to manage backend process
|
|
- [ ] 18.8 Configure SSL certificate (Let's Encrypt via 1Panel)
|
|
- [ ] 18.9 Perform production smoke tests (upload, OCR, export PDF)
|
|
- [ ] 18.10 Setup monitoring and logging
|
|
- [ ] 18.11 Verify PDF generation works in production environment
|
|
|
|
## Phase 5: Translation Feature (FUTURE)
|
|
|
|
### 19. Translation Implementation (Post-Launch)
|
|
- [ ] 19.1 Decide on translation engine (Argos offline vs ERNIE API vs Google API)
|
|
- [ ] 19.2 Implement chosen translation engine integration
|
|
- [ ] 19.3 Implement Markdown translation with structure preservation
|
|
- [ ] 19.4 Update POST `/api/v1/translate/document` endpoint (remove 501 status)
|
|
- [ ] 19.5 Add translation configuration UI (enable TranslationPanel component)
|
|
- [ ] 19.6 Add source/target language selection
|
|
- [ ] 19.7 Implement translation progress tracking
|
|
- [ ] 19.8 Test translation with various document types
|
|
- [ ] 19.9 Optimize translation quality for technical documents
|
|
- [ ] 19.10 Update documentation with translation feature guide
|
|
|
|
## Summary
|
|
|
|
**Phase 1 (Core OCR + Layout Preservation)**: Tasks 1-10 (基礎 OCR + 版面保留 PDF)
|
|
**Phase 2 (Frontend)**: Tasks 11-14 (用戶界面)
|
|
**Phase 3 (Testing)**: Tasks 15-16 (測試與文檔)
|
|
**Phase 4 (Deployment)**: Tasks 17-18 (部署)
|
|
**Phase 5 (Translation)**: Task 19 (翻譯功能 - 未來實現)
|
|
|
|
**Total Tasks**: 150+ tasks
|
|
**Priority**: Complete Phase 1-4 first, Phase 5 after production deployment and user feedback
|