proposal: add hybrid control mode with auto-detection and preview
Updates add-layout-preprocessing proposal: - Auto mode: analyze image quality, auto-select parameters - Manual mode: user override with specific settings - Preview API: compare original vs preprocessed before processing - Frontend UI: mode selection, manual controls, preview button 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -27,13 +27,15 @@ Original Image ← ← ← ← Image extraction crops from original (NOT preproc
|
||||
### Goals
|
||||
- Improve table detection for documents with faint lines
|
||||
- Preserve original image quality for element extraction
|
||||
- Make preprocessing configurable (enable/disable, intensity)
|
||||
- **Hybrid control**: Auto mode by default, manual override available
|
||||
- **Preview capability**: Users can verify preprocessing before processing
|
||||
- Minimal performance impact
|
||||
|
||||
### Non-Goals
|
||||
- Preprocessing for text recognition (Raw OCR handles this separately)
|
||||
- Modifying how PP-Structure internally processes images
|
||||
- General image quality improvement (out of scope)
|
||||
- Real-time preview during processing (preview is pre-processing only)
|
||||
|
||||
## Decisions
|
||||
|
||||
@@ -65,6 +67,72 @@ Original Image ← ← ← ← Image extraction crops from original (NOT preproc
|
||||
- Helps make faint table borders more detectable
|
||||
- Configurable strength
|
||||
|
||||
### Decision 5: Hybrid Control Mode (Auto + Manual)
|
||||
**Rationale**:
|
||||
- Auto mode provides seamless experience for most users
|
||||
- Manual mode gives power users fine control
|
||||
- Preview allows verification before committing to processing
|
||||
|
||||
**Auto-detection algorithm**:
|
||||
```python
|
||||
def analyze_image_quality(image: np.ndarray) -> dict:
|
||||
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
|
||||
|
||||
# Contrast: standard deviation of pixel values
|
||||
contrast = np.std(gray)
|
||||
|
||||
# Edge strength: mean of Sobel gradient magnitude
|
||||
sobel_x = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
|
||||
sobel_y = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
|
||||
edge_strength = np.mean(np.sqrt(sobel_x**2 + sobel_y**2))
|
||||
|
||||
return {
|
||||
"contrast": contrast,
|
||||
"edge_strength": edge_strength,
|
||||
"recommended": {
|
||||
"contrast": "clahe" if contrast < 40 else "none",
|
||||
"sharpen": edge_strength < 15,
|
||||
"binarize": contrast < 20
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Decision 6: Preview API Design
|
||||
**Rationale**:
|
||||
- Users should see preprocessing effect before full processing
|
||||
- Reduces trial-and-error cycles
|
||||
- Builds user confidence in the system
|
||||
|
||||
**API Design**:
|
||||
```
|
||||
POST /api/v2/tasks/{task_id}/preview/preprocessing
|
||||
Request:
|
||||
{
|
||||
"page": 1,
|
||||
"mode": "auto", // or "manual"
|
||||
"config": { // only for manual mode
|
||||
"contrast": "clahe",
|
||||
"sharpen": true,
|
||||
"binarize": false
|
||||
}
|
||||
}
|
||||
|
||||
Response:
|
||||
{
|
||||
"original_url": "/api/v2/tasks/{id}/pages/1/image",
|
||||
"preprocessed_url": "/api/v2/tasks/{id}/pages/1/image?preprocessed=true",
|
||||
"quality_metrics": {
|
||||
"contrast": 35.2,
|
||||
"edge_strength": 12.8
|
||||
},
|
||||
"auto_config": {
|
||||
"contrast": "clahe",
|
||||
"sharpen": true,
|
||||
"binarize": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Preprocessing Pipeline
|
||||
|
||||
Reference in New Issue
Block a user