Files
OCR/openspec/changes/archive/2025-11-18-fix-result-preview-and-pdf-download/tasks.md
egg cd3cbea49d chore: project cleanup and prepare for dual-track processing refactor
- Removed all test files and directories
- Deleted outdated documentation (will be rewritten)
- Cleaned up temporary files, logs, and uploads
- Archived 5 completed OpenSpec proposals
- Created new dual-track-document-processing proposal with complete OpenSpec structure
  - Dual-track architecture: OCR track (PaddleOCR) + Direct track (PyMuPDF)
  - UnifiedDocument model for consistent output
  - Support for structure-preserving translation
- Updated .gitignore to prevent future test/temp files

This is a major cleanup preparing for the complete refactoring of the document processing pipeline.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 20:02:31 +08:00

5.8 KiB
Raw Blame History

Implementation Tasks

1. Backend - Fix Image Extraction and Saving (PREREQUISITE)

  • 1.1 Locate analyze_layout() function in backend/app/services/ocr_service.py
  • 1.2 Find image saving code at lines 554-561 where markdown_images.items() is iterated
  • 1.3 Add code to create imgs/ subdirectory in result folder before saving images
  • 1.4 Extract img_obj from (img_path, img_obj) tuple in loop
  • 1.5 Construct full image file path: image_path.parent / img_path
  • 1.6 Save each img_obj to disk using PIL Image.save() method
  • 1.7 Add error handling for image save failures (log warning but continue)
  • 1.8 Test with document containing images - verify imgs/ folder created
  • 1.9 Verify saved image files match paths in JSON images_metadata
  • 1.10 Test multi-page PDF with images on different pages

2. Backend - Environment Setup

  • 2.1 Install ReportLab library: pip install reportlab
  • 2.2 Verify Pillow is already installed (used for image handling)
  • 2.3 Download and install Noto Sans CJK font (TrueType format)
  • 2.4 Configure font path in backend settings
  • 2.5 Test Chinese character rendering

3. Backend - PDF Generation Service

  • 3.1 Create pdf_generator_service.py in app/services/
  • 3.2 Implement load_ocr_json(json_path) to parse JSON results
  • 3.3 Implement calculate_page_dimensions(text_regions) to infer page size from bbox
  • 3.4 Implement get_original_page_size(file_path) to extract from source file
  • 3.5 Implement draw_text_region(canvas, region, font, page_height) to render text at bbox
  • 3.6 Implement generate_layout_pdf(json_path, output_path) main function
  • 3.7 Handle coordinate transformation (OCR coords to PDF coords)
  • 3.8 Add font size calculation based on bbox height
  • 3.9 Handle multi-page documents
  • 3.10 Add caching logic (check if PDF already exists)
  • 3.11 Implement draw_table_region(canvas, region) using ReportLab Table
  • 3.12 Implement draw_image_region(canvas, region) from images_metadata (reads from saved imgs/)

4. Backend - PDF Download Endpoint Fix

  • 4.1 Update /tasks/{id}/download/pdf endpoint in tasks.py router
  • 4.2 Check if PDF already exists; if not, trigger on-demand generation
  • 4.3 Serve pre-generated PDF file from task result directory
  • 4.4 Add error handling for missing PDF or generation failures
  • 4.5 Test PDF download endpoint returns 200 with valid PDF

5. Backend - Integrate PDF Generation into OCR Flow (REQUIRED)

  • 5.1 Modify OCR service to generate PDF automatically after JSON creation
  • 5.2 Update save_results() to return (json_path, markdown_path, pdf_path)
  • 5.3 PDF generation integrated into OCR completion flow
  • 5.4 PDF generated synchronously during OCR processing (avoids timeout issues)
  • 5.5 Test PDF generation triggers automatically after OCR completes

6. Frontend - Install Dependencies

  • 6.1 Install react-pdf: npm install react-pdf
  • 6.2 Install pdfjs-dist (peer dependency): npm install pdfjs-dist
  • 6.3 Configure vite for PDF.js worker and optimization

7. Frontend - Create PDF Viewer Component

  • 7.1 Create PDFViewer.tsx component in components/
  • 7.2 Implement Document and Page rendering from react-pdf
  • 7.3 Add zoom controls (zoom in/out, 50%-300%)
  • 7.4 Add page navigation (previous, next, page counter)
  • 7.5 Add loading spinner while PDF loads
  • 7.6 Add error boundary for PDF loading failures
  • 7.7 Style PDF container with proper sizing and authentication support

8. Frontend - Results Page Integration

  • 8.1 Import PDFViewer component in ResultsPage.tsx
  • 8.2 Construct PDF URL from task data
  • 8.3 Replace placeholder text with PDFViewer
  • 8.4 Add authentication headers (Bearer token)
  • 8.5 Test PDF preview rendering

9. Frontend - Task Detail Page Integration

  • 9.1 Import PDFViewer component in TaskDetailPage.tsx
  • 9.2 Construct PDF URL from task data
  • 9.3 Replace placeholder text with PDFViewer
  • 9.4 Add authentication headers (Bearer token)
  • 9.5 Test PDF preview rendering

10. Testing ⚠️ (待實際 OCR 任務測試)

基本驗證 (已完成)

  • 10.1 Backend service imports successfully
  • 10.2 Frontend TypeScript compilation passes
  • 10.3 PDF Generator Service loads correctly
  • 10.4 OCR Service loads with image saving updates

功能測試 (需實際 OCR 任務)

  • 10.5 Fixed page filtering issue for tables and images (修復表格與圖片頁碼分配錯誤)
  • 10.6 Adjusted rendering order (images → tables → text) to prevent overlapping
  • 10.7 Fixed text filtering logic (使用正確的數據來源 images_metadata修復文字與表格/圖片重疊問題)
  • 10.8 Test image extraction and saving (verify imgs/ folder created with correct files)
  • 10.8 Test image saving with multi-page PDFs
  • 10.9 Test PDF generation with single-page document
  • 10.10 Test PDF generation with multi-page document
  • 10.11 Test Chinese character rendering in PDF
  • 10.12 Test coordinate accuracy (verify text positioned correctly)
  • 10.13 Test table rendering in PDF (if JSON contains tables)
  • 10.14 Test image embedding in PDF (verify images from imgs/ folder appear correctly)
  • 10.15 Test PDF caching (second request uses cached version)
  • 10.16 Test automatic PDF generation after OCR completion
  • 10.17 Test PDF download from Results page
  • 10.18 Test PDF download from Task Detail page
  • 10.19 Test PDF preview on Results page
  • 10.20 Test PDF preview on Task Detail page
  • 10.21 Test error handling when JSON is missing
  • 10.22 Test error handling when PDF generation fails
  • 10.23 Test error handling when image files are missing or corrupt