egg/OCR

Files

egg dda9621e17 feat: enhance layout preprocessing and unify image scaling proposal

Backend changes:
- Add image scaling configuration for PP-Structure processing
- Enhance layout preprocessing service with scaling support
- Update OCR service with improved memory management
- Add PP-Structure enhanced processing improvements

Frontend changes:
- Update preprocessing settings UI
- Fix processing page layout and state management
- Update API types for new parameters

Proposals:
- Archive add-layout-preprocessing proposal (completed)
- Add unify-image-scaling proposal for consistent coordinate handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-28 09:23:19 +08:00

3.1 KiB

Raw Blame History

Change: Add Image Preprocessing for Layout Detection

Why

PP-StructureV3's layout detection (PP-DocLayout_plus-L) sometimes fails to detect tables with faint lines, low contrast borders, or poor scan quality. This results in missing table elements in the output, even when the table structure recognition models (SLANeXt) are correctly configured.

The root cause is that layout detection happens before table structure recognition - if a region isn't identified as a "table" in the layout detection stage, the table recognition models never get invoked.

What Changes

Add image preprocessing module for layout detection input
- Contrast enhancement (histogram equalization, CLAHE)
- Optional binarization (adaptive thresholding)
- Sharpening for faint lines
Preserve original images for extraction
- Preprocessing ONLY affects layout detection input
- Image element extraction continues to use original (preserves quality)
- Raw OCR continues to use original image
Hybrid control mode (Auto + Manual)
- Auto mode (default): Analyze image quality and auto-select parameters
  - Calculate contrast level (standard deviation)
  - Detect edge clarity for faint lines
  - Apply appropriate preprocessing based on analysis
- Manual mode: User can override with specific settings
  - Contrast: none / histogram / clahe
  - Sharpen: on/off
  - Binarize: on/off
Frontend preview API
- Preview endpoint to show original vs preprocessed comparison
- Users can verify settings before processing

Impact

Affected Specs

ocr-processing - New preprocessing configuration requirements

Affected Code

backend/app/services/ocr_service.py - Add preprocessing before PP-Structure
backend/app/core/config.py - New preprocessing configuration options
backend/app/services/preprocessing_service.py - New service (to be created)
backend/app/api/v2/endpoints/preview.py - New preview API endpoint
frontend/src/components/PreprocessingSettings.tsx - New UI component

Track Impact Analysis

Track	Impact	Reason
OCR	Improved layout detection	Preprocessing enhances PP-Structure input
Hybrid	Potentially improved	Uses PP-Structure for layout
Direct	No impact	Does not use PP-Structure
Raw OCR	No impact	Continues using original image

Quality Impact

Component	Impact	Reason
Table detection	Improved	Enhanced contrast reveals faint borders
Image extraction	No change	Uses original image for quality
Text recognition	No change	Raw OCR uses original image
Reading order	Improved	Better element detection → better ordering

Risks

Performance overhead: Preprocessing adds compute time per page
- Mitigation: Make preprocessing optional, cache preprocessed images
Over-processing: Strong enhancement may introduce artifacts
- Mitigation: Configurable intensity levels, default to moderate enhancement
Memory usage: Keeping both original and preprocessed images
- Mitigation: Preprocessed image is temporary, discarded after layout detection

3.1 KiB Raw Blame History