Backend changes: - Add image scaling configuration for PP-Structure processing - Enhance layout preprocessing service with scaling support - Update OCR service with improved memory management - Add PP-Structure enhanced processing improvements Frontend changes: - Update preprocessing settings UI - Fix processing page layout and state management - Update API types for new parameters Proposals: - Archive add-layout-preprocessing proposal (completed) - Add unify-image-scaling proposal for consistent coordinate handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
3.1 KiB
Change: Add Image Preprocessing for Layout Detection
Why
PP-StructureV3's layout detection (PP-DocLayout_plus-L) sometimes fails to detect tables with faint lines, low contrast borders, or poor scan quality. This results in missing table elements in the output, even when the table structure recognition models (SLANeXt) are correctly configured.
The root cause is that layout detection happens before table structure recognition - if a region isn't identified as a "table" in the layout detection stage, the table recognition models never get invoked.
What Changes
-
Add image preprocessing module for layout detection input
- Contrast enhancement (histogram equalization, CLAHE)
- Optional binarization (adaptive thresholding)
- Sharpening for faint lines
-
Preserve original images for extraction
- Preprocessing ONLY affects layout detection input
- Image element extraction continues to use original (preserves quality)
- Raw OCR continues to use original image
-
Hybrid control mode (Auto + Manual)
- Auto mode (default): Analyze image quality and auto-select parameters
- Calculate contrast level (standard deviation)
- Detect edge clarity for faint lines
- Apply appropriate preprocessing based on analysis
- Manual mode: User can override with specific settings
- Contrast: none / histogram / clahe
- Sharpen: on/off
- Binarize: on/off
- Auto mode (default): Analyze image quality and auto-select parameters
-
Frontend preview API
- Preview endpoint to show original vs preprocessed comparison
- Users can verify settings before processing
Impact
Affected Specs
ocr-processing- New preprocessing configuration requirements
Affected Code
backend/app/services/ocr_service.py- Add preprocessing before PP-Structurebackend/app/core/config.py- New preprocessing configuration optionsbackend/app/services/preprocessing_service.py- New service (to be created)backend/app/api/v2/endpoints/preview.py- New preview API endpointfrontend/src/components/PreprocessingSettings.tsx- New UI component
Track Impact Analysis
| Track | Impact | Reason |
|---|---|---|
| OCR | Improved layout detection | Preprocessing enhances PP-Structure input |
| Hybrid | Potentially improved | Uses PP-Structure for layout |
| Direct | No impact | Does not use PP-Structure |
| Raw OCR | No impact | Continues using original image |
Quality Impact
| Component | Impact | Reason |
|---|---|---|
| Table detection | Improved | Enhanced contrast reveals faint borders |
| Image extraction | No change | Uses original image for quality |
| Text recognition | No change | Raw OCR uses original image |
| Reading order | Improved | Better element detection → better ordering |
Risks
-
Performance overhead: Preprocessing adds compute time per page
- Mitigation: Make preprocessing optional, cache preprocessed images
-
Over-processing: Strong enhancement may introduce artifacts
- Mitigation: Configurable intensity levels, default to moderate enhancement
-
Memory usage: Keeping both original and preprocessed images
- Mitigation: Preprocessed image is temporary, discarded after layout detection