feat: enhance layout preprocessing and unify image scaling proposal

Backend changes:
- Add image scaling configuration for PP-Structure processing
- Enhance layout preprocessing service with scaling support
- Update OCR service with improved memory management
- Add PP-Structure enhanced processing improvements

Frontend changes:
- Update preprocessing settings UI
- Fix processing page layout and state management
- Update API types for new parameters

Proposals:
- Archive add-layout-preprocessing proposal (completed)
- Add unify-image-scaling proposal for consistent coordinate handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-28 09:23:19 +08:00
parent 86bbea6fbf
commit dda9621e17
17 changed files with 826 additions and 104 deletions

View File

@@ -0,0 +1,192 @@
# Design: Layout Detection Image Preprocessing
## Context
PP-StructureV3's layout detection model (PP-DocLayout_plus-L) sometimes fails to detect tables with faint lines or low contrast. This is a preprocessing problem - the model can detect tables when lines are clearly visible, but struggles with poor quality scans or documents with light-colored borders.
### Current Flow
```
Original Image → PP-Structure (layout detection) → Element Recognition
Returns element bboxes
Image extraction crops from original
```
### Proposed Flow
```
Original Image → Preprocess → PP-Structure (layout detection) → Element Recognition
Returns element bboxes
Original Image ← ← ← ← Image extraction crops from original (NOT preprocessed)
```
## Goals / Non-Goals
### Goals
- Improve table detection for documents with faint lines
- Preserve original image quality for element extraction
- **Hybrid control**: Auto mode by default, manual override available
- **Preview capability**: Users can verify preprocessing before processing
- Minimal performance impact
### Non-Goals
- Preprocessing for text recognition (Raw OCR handles this separately)
- Modifying how PP-Structure internally processes images
- General image quality improvement (out of scope)
- Real-time preview during processing (preview is pre-processing only)
## Decisions
### Decision 1: Preprocess only for layout detection input
**Rationale**:
- Layout detection needs enhanced edges/contrast to identify regions
- Image element extraction needs original quality for output
- Raw OCR text recognition works independently and doesn't need preprocessing
### Decision 2: Use CLAHE (Contrast Limited Adaptive Histogram Equalization) as default
**Rationale**:
- CLAHE prevents over-amplification in already bright areas
- Adaptive nature handles varying background regions
- Well-supported by OpenCV
**Alternatives considered**:
- Global histogram equalization: Too aggressive, causes artifacts
- Manual brightness/contrast: Not adaptive to document variations
### Decision 3: Preprocessing is applied in-memory, not saved to disk
**Rationale**:
- Preprocessed image is only needed during PP-Structure call
- Saving would increase storage and I/O overhead
- Original image is already saved and used for extraction
### Decision 4: Sharpening via Unsharp Mask
**Rationale**:
- Enhances edges without introducing noise
- Helps make faint table borders more detectable
- Configurable strength
### Decision 5: Hybrid Control Mode (Auto + Manual)
**Rationale**:
- Auto mode provides seamless experience for most users
- Manual mode gives power users fine control
- Preview allows verification before committing to processing
**Auto-detection algorithm**:
```python
def analyze_image_quality(image: np.ndarray) -> dict:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Contrast: standard deviation of pixel values
contrast = np.std(gray)
# Edge strength: mean of Sobel gradient magnitude
sobel_x = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
sobel_y = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
edge_strength = np.mean(np.sqrt(sobel_x**2 + sobel_y**2))
return {
"contrast": contrast,
"edge_strength": edge_strength,
"recommended": {
"contrast": "clahe" if contrast < 40 else "none",
"sharpen": edge_strength < 15,
"binarize": contrast < 20
}
}
```
### Decision 6: Preview API Design
**Rationale**:
- Users should see preprocessing effect before full processing
- Reduces trial-and-error cycles
- Builds user confidence in the system
**API Design**:
```
POST /api/v2/tasks/{task_id}/preview/preprocessing
Request:
{
"page": 1,
"mode": "auto", // or "manual"
"config": { // only for manual mode
"contrast": "clahe",
"sharpen": true,
"binarize": false
}
}
Response:
{
"original_url": "/api/v2/tasks/{id}/pages/1/image",
"preprocessed_url": "/api/v2/tasks/{id}/pages/1/image?preprocessed=true",
"quality_metrics": {
"contrast": 35.2,
"edge_strength": 12.8
},
"auto_config": {
"contrast": "clahe",
"sharpen": true,
"binarize": false
}
}
```
## Implementation Details
### Preprocessing Pipeline
```python
def enhance_for_layout_detection(image: Image.Image, config: Settings) -> Image.Image:
"""Enhance image for better layout detection."""
# Step 1: Contrast enhancement
if config.layout_preprocessing_contrast == "clahe":
image = apply_clahe(image)
elif config.layout_preprocessing_contrast == "histogram":
image = apply_histogram_equalization(image)
# Step 2: Sharpening (optional)
if config.layout_preprocessing_sharpen:
image = apply_unsharp_mask(image)
# Step 3: Binarization (optional, aggressive)
if config.layout_preprocessing_binarize:
image = apply_adaptive_threshold(image)
return image
```
### Integration Point
```python
# In ocr_service.py, before calling PP-Structure
if settings.layout_preprocessing_enabled:
preprocessed_image = enhance_for_layout_detection(page_image, settings)
pp_input = preprocessed_image
else:
pp_input = page_image
# PP-Structure gets preprocessed (or original if disabled)
layout_results = self.structure_engine(pp_input)
# Image extraction still uses original
for element in layout_results:
if element.type == "image":
crop_image_from_original(page_image, element.bbox) # Use original!
```
## Risks / Trade-offs
| Risk | Mitigation |
|------|------------|
| Performance overhead | Preprocessing is fast (~50ms/page), enable/disable option |
| Over-enhancement artifacts | CLAHE clip limit prevents over-saturation, configurable |
| Memory spike for large images | Process one page at a time, discard preprocessed after use |
## Open Questions
1. Should binarization be applied before or after CLAHE?
- Current: After (enhances contrast first, then binarize if needed)
2. Should preprocessing parameters be tunable per-request or only server-wide?
- Current: Server-wide config only (simpler)

View File

@@ -0,0 +1,74 @@
# Change: Add Image Preprocessing for Layout Detection
## Why
PP-StructureV3's layout detection (PP-DocLayout_plus-L) sometimes fails to detect tables with faint lines, low contrast borders, or poor scan quality. This results in missing table elements in the output, even when the table structure recognition models (SLANeXt) are correctly configured.
The root cause is that layout detection happens **before** table structure recognition - if a region isn't identified as a "table" in the layout detection stage, the table recognition models never get invoked.
## What Changes
- **Add image preprocessing module** for layout detection input
- Contrast enhancement (histogram equalization, CLAHE)
- Optional binarization (adaptive thresholding)
- Sharpening for faint lines
- **Preserve original images for extraction**
- Preprocessing ONLY affects layout detection input
- Image element extraction continues to use original (preserves quality)
- Raw OCR continues to use original image
- **Hybrid control mode** (Auto + Manual)
- **Auto mode (default)**: Analyze image quality and auto-select parameters
- Calculate contrast level (standard deviation)
- Detect edge clarity for faint lines
- Apply appropriate preprocessing based on analysis
- **Manual mode**: User can override with specific settings
- Contrast: none / histogram / clahe
- Sharpen: on/off
- Binarize: on/off
- **Frontend preview API**
- Preview endpoint to show original vs preprocessed comparison
- Users can verify settings before processing
## Impact
### Affected Specs
- `ocr-processing` - New preprocessing configuration requirements
### Affected Code
- `backend/app/services/ocr_service.py` - Add preprocessing before PP-Structure
- `backend/app/core/config.py` - New preprocessing configuration options
- `backend/app/services/preprocessing_service.py` - New service (to be created)
- `backend/app/api/v2/endpoints/preview.py` - New preview API endpoint
- `frontend/src/components/PreprocessingSettings.tsx` - New UI component
### Track Impact Analysis
| Track | Impact | Reason |
|-------|--------|--------|
| OCR | Improved layout detection | Preprocessing enhances PP-Structure input |
| Hybrid | Potentially improved | Uses PP-Structure for layout |
| Direct | No impact | Does not use PP-Structure |
| Raw OCR | No impact | Continues using original image |
### Quality Impact
| Component | Impact | Reason |
|-----------|--------|--------|
| Table detection | Improved | Enhanced contrast reveals faint borders |
| Image extraction | No change | Uses original image for quality |
| Text recognition | No change | Raw OCR uses original image |
| Reading order | Improved | Better element detection → better ordering |
## Risks
1. **Performance overhead**: Preprocessing adds compute time per page
- Mitigation: Make preprocessing optional, cache preprocessed images
2. **Over-processing**: Strong enhancement may introduce artifacts
- Mitigation: Configurable intensity levels, default to moderate enhancement
3. **Memory usage**: Keeping both original and preprocessed images
- Mitigation: Preprocessed image is temporary, discarded after layout detection

View File

@@ -0,0 +1,128 @@
## ADDED Requirements
### Requirement: Layout Detection Image Preprocessing
The system SHALL provide optional image preprocessing to enhance layout detection accuracy for documents with faint lines, low contrast, or poor scan quality.
#### Scenario: Preprocessing improves table detection
- **GIVEN** a document with faint table borders that PP-Structure fails to detect
- **WHEN** layout preprocessing is enabled
- **THEN** the system SHALL preprocess the image before layout detection
- **AND** contrast enhancement SHALL make faint lines more visible
- **AND** PP-Structure SHALL receive the preprocessed image for layout detection
#### Scenario: Image element extraction uses original quality
- **GIVEN** an image element detected by PP-Structure from preprocessed input
- **WHEN** the system extracts the image element
- **THEN** the system SHALL crop from the ORIGINAL image, not the preprocessed version
- **AND** the extracted image SHALL maintain original quality and colors
#### Scenario: CLAHE contrast enhancement
- **WHEN** `layout_preprocessing_contrast` is set to "clahe"
- **THEN** the system SHALL apply Contrast Limited Adaptive Histogram Equalization
- **AND** the enhancement SHALL not over-saturate already bright regions
#### Scenario: Sharpening enhances faint lines
- **WHEN** `layout_preprocessing_sharpen` is enabled
- **THEN** the system SHALL apply unsharp masking to enhance edges
- **AND** faint table borders SHALL become more detectable
#### Scenario: Optional binarization for extreme cases
- **WHEN** `layout_preprocessing_binarize` is enabled
- **THEN** the system SHALL apply adaptive thresholding
- **AND** this SHALL be used only for documents with very poor contrast
### Requirement: Preprocessing Hybrid Control Mode
The system SHALL support three preprocessing modes: automatic, manual, and disabled, with automatic as the default.
#### Scenario: Auto mode analyzes image quality
- **GIVEN** preprocessing mode is set to "auto"
- **WHEN** processing begins for a page
- **THEN** the system SHALL analyze image quality metrics (contrast, edge strength)
- **AND** automatically determine optimal preprocessing parameters
- **AND** apply recommended settings without user intervention
#### Scenario: Auto mode detects low contrast
- **GIVEN** preprocessing mode is "auto"
- **WHEN** image contrast (standard deviation) is below 40
- **THEN** the system SHALL automatically enable CLAHE contrast enhancement
#### Scenario: Auto mode detects faint edges
- **GIVEN** preprocessing mode is "auto"
- **WHEN** image edge strength (Sobel gradient mean) is below 15
- **THEN** the system SHALL automatically enable sharpening
#### Scenario: Manual mode uses user-specified settings
- **GIVEN** preprocessing mode is set to "manual"
- **WHEN** processing begins
- **THEN** the system SHALL use the user-provided preprocessing configuration
- **AND** ignore automatic quality analysis
#### Scenario: Disabled mode skips preprocessing
- **GIVEN** preprocessing mode is set to "disabled"
- **WHEN** processing begins
- **THEN** the system SHALL skip all preprocessing
- **AND** PP-Structure SHALL receive the original image directly
### Requirement: Preprocessing Preview API
The system SHALL provide a preview endpoint that allows users to compare original and preprocessed images before processing.
#### Scenario: Preview returns comparison images
- **GIVEN** a task with uploaded document
- **WHEN** user requests preprocessing preview for a specific page
- **THEN** the system SHALL return URLs or data for both original and preprocessed images
- **AND** user can visually compare the difference
#### Scenario: Preview shows auto-detected settings
- **GIVEN** preview is requested with mode "auto"
- **WHEN** the system analyzes the page
- **THEN** the response SHALL include the auto-detected preprocessing configuration
- **AND** include quality metrics (contrast, edge_strength)
#### Scenario: Preview accepts manual configuration
- **GIVEN** preview is requested with mode "manual"
- **WHEN** user provides specific preprocessing settings
- **THEN** the system SHALL apply those settings to generate preview
- **AND** return the preprocessed result for user verification
### Requirement: Preprocessing Track Isolation
The layout preprocessing feature SHALL only affect layout detection input without impacting other processing components.
#### Scenario: Raw OCR is unaffected
- **GIVEN** layout preprocessing is enabled
- **WHEN** Raw OCR processing runs
- **THEN** Raw OCR SHALL use the original image
- **AND** text detection quality SHALL not be affected by preprocessing
#### Scenario: Preprocessed image is temporary
- **GIVEN** an image is preprocessed for layout detection
- **WHEN** layout detection completes
- **THEN** the preprocessed image SHALL NOT be persisted to storage
- **AND** only the original image and element crops SHALL be saved
### Requirement: Preprocessing Frontend UI
The frontend SHALL provide a user interface for configuring and previewing preprocessing settings.
#### Scenario: Mode selection is available
- **GIVEN** the user is configuring OCR track processing
- **WHEN** the preprocessing settings panel is displayed
- **THEN** the user SHALL be able to select mode: Auto (default), Manual, or Disabled
- **AND** Auto mode SHALL be pre-selected
#### Scenario: Manual mode shows configuration options
- **GIVEN** the user selects Manual mode
- **WHEN** the settings panel updates
- **THEN** the user SHALL see options for:
- Contrast enhancement (None / Histogram / CLAHE)
- Sharpen toggle
- Binarize toggle
#### Scenario: Preview button triggers comparison view
- **GIVEN** preprocessing settings are configured
- **WHEN** the user clicks Preview button
- **THEN** the system SHALL display side-by-side comparison of original and preprocessed images
- **AND** show detected quality metrics

View File

@@ -0,0 +1,141 @@
# Tasks: Add Image Preprocessing for Layout Detection
## 1. Configuration
- [x] 1.1 Add preprocessing configuration to `backend/app/core/config.py`
- `layout_preprocessing_mode: str = "auto"` - Options: auto, manual, disabled
- `layout_preprocessing_contrast: str = "clahe"` - Options: none, histogram, clahe
- `layout_preprocessing_sharpen: bool = True` - Enable sharpening for faint lines
- `layout_preprocessing_binarize: bool = False` - Optional binarization (aggressive)
- [x] 1.2 Add preprocessing schema to `backend/app/schemas/task.py`
- `PreprocessingMode` enum: auto, manual, disabled
- `PreprocessingConfig` schema for API request/response
## 2. Preprocessing Service
- [x] 2.1 Create `backend/app/services/layout_preprocessing_service.py`
- Image loading utility (supports PIL, OpenCV)
- Contrast enhancement methods (histogram equalization, CLAHE)
- Sharpening filter for line enhancement
- Optional adaptive binarization
- Return preprocessed image as numpy array or PIL Image
- [x] 2.2 Implement `preprocess()` and `preprocess_to_pil()` functions
- Input: Original image path or PIL Image + config
- Output: Preprocessed image (same format as input) + PreprocessingResult
- Steps: contrast → sharpen → (optional) binarize
- [x] 2.3 Implement `analyze_image_quality()` function (Auto mode)
- Calculate contrast level (standard deviation of grayscale)
- Detect edge clarity (Sobel gradient mean)
- Return ImageQualityMetrics based on analysis
- `get_auto_config()` returns PreprocessingConfig based on thresholds:
- Low contrast < 40: Apply CLAHE
- Faint edges < 15: Apply sharpen
- Very low contrast < 20: Consider binarize
## 3. Integration with OCR Service
- [x] 3.1 Update `backend/app/services/ocr_service.py`
- Import preprocessing service
- Check preprocessing mode (auto/manual/disabled)
- If auto: call `analyze_image_quality()` first
- Before PP-Structure prediction, preprocess image based on config
- Pass preprocessed PIL Image to PP-Structure for layout detection
- Keep original image reference for image extraction
- [x] 3.2 Update `backend/app/services/pp_structure_enhanced.py`
- Add `preprocessed_image` parameter to `analyze_with_full_structure()`
- When preprocessed_image provided, convert to BGR numpy array and pass to PP-Structure
- Bbox coordinates from preprocessed detection applied to original image crop
- [x] 3.3 Update task start API to accept preprocessing options
- Add `preprocessing_mode` parameter to ProcessingOptions
- Add `preprocessing_config` for manual mode overrides
## 4. Preview API
- [x] 4.1 Create preview endpoints in `backend/app/routers/tasks.py`
- `POST /api/v2/tasks/{task_id}/preview/preprocessing`
- Input: page number, preprocessing mode/config
- Output: PreprocessingPreviewResponse with:
- Original image URL
- Preprocessed image URL
- Auto-detected config
- Image quality metrics (contrast, edge_strength)
- `GET /api/v2/tasks/{task_id}/preview/image` - Serve preview images
- [x] 4.2 Add preview router functionality
- Integrated into tasks router
- Uses task authentication/authorization
## 5. Frontend UI
- [x] 5.1 Create `frontend/src/components/PreprocessingSettings.tsx`
- Radio buttons with icons: Auto / Manual / Disabled
- Manual mode shows:
- Contrast dropdown: None / Histogram / CLAHE
- Sharpen checkbox
- Binarize checkbox (with warning)
- Preview button integration (onPreview prop)
- [ ] 5.2 Create `frontend/src/components/PreprocessingPreview.tsx` (optional)
- Side-by-side image comparison (original vs preprocessed)
- Display detected quality metrics
- Note: Preview functionality available via API, UI modal is optional enhancement
- [x] 5.3 Integrate with task start flow
- Added PreprocessingSettings to ProcessingPage.tsx
- Pass selected config to task start API
- Note: localStorage preference storage is optional enhancement
- [x] 5.4 Add i18n translations
- `frontend/src/i18n/locales/zh-TW.json` - Traditional Chinese
## 6. Testing
- [x] 6.1 Unit tests for preprocessing_service
- Validated imports and service creation
- Tested `analyze_image_quality()` with test images
- Tested `get_auto_config()` returns sensible config
- Tested `preprocess()` produces correct output shape
- [ ] 6.2 Integration tests for preview API (optional)
- Manual testing recommended with actual documents
- [ ] 6.3 End-to-end testing
- Test OCR track with preprocessing modes (auto/manual/disabled)
- Test with known problematic documents (faint table borders)
## 7. Documentation
- [x] 7.1 Update API documentation
- Schemas documented in task.py with Field descriptions
- Preview endpoint accessible via /docs
- [ ] 7.2 Add user guide section (optional)
- When to use auto vs manual
- How to interpret quality metrics
---
## Implementation Summary
**Backend commits:**
1. `feat: implement layout preprocessing backend` - Core service, OCR integration, preview API
**Frontend commits:**
1. `feat: add preprocessing UI components and integration` - PreprocessingSettings, i18n, ProcessingPage integration
**Key files created/modified:**
- `backend/app/services/layout_preprocessing_service.py` (new)
- `backend/app/core/config.py` (updated)
- `backend/app/schemas/task.py` (updated)
- `backend/app/services/ocr_service.py` (updated)
- `backend/app/services/pp_structure_enhanced.py` (updated)
- `backend/app/routers/tasks.py` (updated)
- `frontend/src/components/PreprocessingSettings.tsx` (new)
- `frontend/src/types/apiV2.ts` (updated)
- `frontend/src/pages/ProcessingPage.tsx` (updated)
- `frontend/src/i18n/locales/zh-TW.json` (updated)