Files
OCR/openspec/changes/add-layout-preprocessing/tasks.md
egg 19cb80460f docs: update add-layout-preprocessing tasks with completion status
Mark implemented tasks as complete and add implementation summary:
- Configuration: config.py and schema additions ✓
- Preprocessing service: layout_preprocessing_service.py ✓
- OCR integration: ocr_service.py and pp_structure_enhanced.py ✓
- Preview API endpoints ✓
- Frontend UI: PreprocessingSettings component ✓
- i18n translations (zh-TW) ✓
- Basic unit testing ✓

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 15:22:52 +08:00

5.6 KiB

Tasks: Add Image Preprocessing for Layout Detection

1. Configuration

  • 1.1 Add preprocessing configuration to backend/app/core/config.py

    • layout_preprocessing_mode: str = "auto" - Options: auto, manual, disabled
    • layout_preprocessing_contrast: str = "clahe" - Options: none, histogram, clahe
    • layout_preprocessing_sharpen: bool = True - Enable sharpening for faint lines
    • layout_preprocessing_binarize: bool = False - Optional binarization (aggressive)
  • 1.2 Add preprocessing schema to backend/app/schemas/task.py

    • PreprocessingMode enum: auto, manual, disabled
    • PreprocessingConfig schema for API request/response

2. Preprocessing Service

  • 2.1 Create backend/app/services/layout_preprocessing_service.py

    • Image loading utility (supports PIL, OpenCV)
    • Contrast enhancement methods (histogram equalization, CLAHE)
    • Sharpening filter for line enhancement
    • Optional adaptive binarization
    • Return preprocessed image as numpy array or PIL Image
  • 2.2 Implement preprocess() and preprocess_to_pil() functions

    • Input: Original image path or PIL Image + config
    • Output: Preprocessed image (same format as input) + PreprocessingResult
    • Steps: contrast → sharpen → (optional) binarize
  • 2.3 Implement analyze_image_quality() function (Auto mode)

    • Calculate contrast level (standard deviation of grayscale)
    • Detect edge clarity (Sobel gradient mean)
    • Return ImageQualityMetrics based on analysis
    • get_auto_config() returns PreprocessingConfig based on thresholds:
      • Low contrast < 40: Apply CLAHE
      • Faint edges < 15: Apply sharpen
      • Very low contrast < 20: Consider binarize

3. Integration with OCR Service

  • 3.1 Update backend/app/services/ocr_service.py

    • Import preprocessing service
    • Check preprocessing mode (auto/manual/disabled)
    • If auto: call analyze_image_quality() first
    • Before PP-Structure prediction, preprocess image based on config
    • Pass preprocessed PIL Image to PP-Structure for layout detection
    • Keep original image reference for image extraction
  • 3.2 Update backend/app/services/pp_structure_enhanced.py

    • Add preprocessed_image parameter to analyze_with_full_structure()
    • When preprocessed_image provided, convert to BGR numpy array and pass to PP-Structure
    • Bbox coordinates from preprocessed detection applied to original image crop
  • 3.3 Update task start API to accept preprocessing options

    • Add preprocessing_mode parameter to ProcessingOptions
    • Add preprocessing_config for manual mode overrides

4. Preview API

  • 4.1 Create preview endpoints in backend/app/routers/tasks.py

    • POST /api/v2/tasks/{task_id}/preview/preprocessing
    • Input: page number, preprocessing mode/config
    • Output: PreprocessingPreviewResponse with:
      • Original image URL
      • Preprocessed image URL
      • Auto-detected config
      • Image quality metrics (contrast, edge_strength)
    • GET /api/v2/tasks/{task_id}/preview/image - Serve preview images
  • 4.2 Add preview router functionality

    • Integrated into tasks router
    • Uses task authentication/authorization

5. Frontend UI

  • 5.1 Create frontend/src/components/PreprocessingSettings.tsx

    • Radio buttons with icons: Auto / Manual / Disabled
    • Manual mode shows:
      • Contrast dropdown: None / Histogram / CLAHE
      • Sharpen checkbox
      • Binarize checkbox (with warning)
    • Preview button integration (onPreview prop)
  • 5.2 Create frontend/src/components/PreprocessingPreview.tsx (optional)

    • Side-by-side image comparison (original vs preprocessed)
    • Display detected quality metrics
    • Note: Preview functionality available via API, UI modal is optional enhancement
  • 5.3 Integrate with task start flow

    • Added PreprocessingSettings to ProcessingPage.tsx
    • Pass selected config to task start API
    • Note: localStorage preference storage is optional enhancement
  • 5.4 Add i18n translations

    • frontend/src/i18n/locales/zh-TW.json - Traditional Chinese

6. Testing

  • 6.1 Unit tests for preprocessing_service

    • Validated imports and service creation
    • Tested analyze_image_quality() with test images
    • Tested get_auto_config() returns sensible config
    • Tested preprocess() produces correct output shape
  • 6.2 Integration tests for preview API (optional)

    • Manual testing recommended with actual documents
  • 6.3 End-to-end testing

    • Test OCR track with preprocessing modes (auto/manual/disabled)
    • Test with known problematic documents (faint table borders)

7. Documentation

  • 7.1 Update API documentation

    • Schemas documented in task.py with Field descriptions
    • Preview endpoint accessible via /docs
  • 7.2 Add user guide section (optional)

    • When to use auto vs manual
    • How to interpret quality metrics

Implementation Summary

Backend commits:

  1. feat: implement layout preprocessing backend - Core service, OCR integration, preview API

Frontend commits:

  1. feat: add preprocessing UI components and integration - PreprocessingSettings, i18n, ProcessingPage integration

Key files created/modified:

  • backend/app/services/layout_preprocessing_service.py (new)
  • backend/app/core/config.py (updated)
  • backend/app/schemas/task.py (updated)
  • backend/app/services/ocr_service.py (updated)
  • backend/app/services/pp_structure_enhanced.py (updated)
  • backend/app/routers/tasks.py (updated)
  • frontend/src/components/PreprocessingSettings.tsx (new)
  • frontend/src/types/apiV2.ts (updated)
  • frontend/src/pages/ProcessingPage.tsx (updated)
  • frontend/src/i18n/locales/zh-TW.json (updated)