Backend changes: - Add image scaling configuration for PP-Structure processing - Enhance layout preprocessing service with scaling support - Update OCR service with improved memory management - Add PP-Structure enhanced processing improvements Frontend changes: - Update preprocessing settings UI - Fix processing page layout and state management - Update API types for new parameters Proposals: - Archive add-layout-preprocessing proposal (completed) - Add unify-image-scaling proposal for consistent coordinate handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
5.8 KiB
5.8 KiB
ADDED Requirements
Requirement: Layout Detection Image Preprocessing
The system SHALL provide optional image preprocessing to enhance layout detection accuracy for documents with faint lines, low contrast, or poor scan quality.
Scenario: Preprocessing improves table detection
- GIVEN a document with faint table borders that PP-Structure fails to detect
- WHEN layout preprocessing is enabled
- THEN the system SHALL preprocess the image before layout detection
- AND contrast enhancement SHALL make faint lines more visible
- AND PP-Structure SHALL receive the preprocessed image for layout detection
Scenario: Image element extraction uses original quality
- GIVEN an image element detected by PP-Structure from preprocessed input
- WHEN the system extracts the image element
- THEN the system SHALL crop from the ORIGINAL image, not the preprocessed version
- AND the extracted image SHALL maintain original quality and colors
Scenario: CLAHE contrast enhancement
- WHEN
layout_preprocessing_contrastis set to "clahe" - THEN the system SHALL apply Contrast Limited Adaptive Histogram Equalization
- AND the enhancement SHALL not over-saturate already bright regions
Scenario: Sharpening enhances faint lines
- WHEN
layout_preprocessing_sharpenis enabled - THEN the system SHALL apply unsharp masking to enhance edges
- AND faint table borders SHALL become more detectable
Scenario: Optional binarization for extreme cases
- WHEN
layout_preprocessing_binarizeis enabled - THEN the system SHALL apply adaptive thresholding
- AND this SHALL be used only for documents with very poor contrast
Requirement: Preprocessing Hybrid Control Mode
The system SHALL support three preprocessing modes: automatic, manual, and disabled, with automatic as the default.
Scenario: Auto mode analyzes image quality
- GIVEN preprocessing mode is set to "auto"
- WHEN processing begins for a page
- THEN the system SHALL analyze image quality metrics (contrast, edge strength)
- AND automatically determine optimal preprocessing parameters
- AND apply recommended settings without user intervention
Scenario: Auto mode detects low contrast
- GIVEN preprocessing mode is "auto"
- WHEN image contrast (standard deviation) is below 40
- THEN the system SHALL automatically enable CLAHE contrast enhancement
Scenario: Auto mode detects faint edges
- GIVEN preprocessing mode is "auto"
- WHEN image edge strength (Sobel gradient mean) is below 15
- THEN the system SHALL automatically enable sharpening
Scenario: Manual mode uses user-specified settings
- GIVEN preprocessing mode is set to "manual"
- WHEN processing begins
- THEN the system SHALL use the user-provided preprocessing configuration
- AND ignore automatic quality analysis
Scenario: Disabled mode skips preprocessing
- GIVEN preprocessing mode is set to "disabled"
- WHEN processing begins
- THEN the system SHALL skip all preprocessing
- AND PP-Structure SHALL receive the original image directly
Requirement: Preprocessing Preview API
The system SHALL provide a preview endpoint that allows users to compare original and preprocessed images before processing.
Scenario: Preview returns comparison images
- GIVEN a task with uploaded document
- WHEN user requests preprocessing preview for a specific page
- THEN the system SHALL return URLs or data for both original and preprocessed images
- AND user can visually compare the difference
Scenario: Preview shows auto-detected settings
- GIVEN preview is requested with mode "auto"
- WHEN the system analyzes the page
- THEN the response SHALL include the auto-detected preprocessing configuration
- AND include quality metrics (contrast, edge_strength)
Scenario: Preview accepts manual configuration
- GIVEN preview is requested with mode "manual"
- WHEN user provides specific preprocessing settings
- THEN the system SHALL apply those settings to generate preview
- AND return the preprocessed result for user verification
Requirement: Preprocessing Track Isolation
The layout preprocessing feature SHALL only affect layout detection input without impacting other processing components.
Scenario: Raw OCR is unaffected
- GIVEN layout preprocessing is enabled
- WHEN Raw OCR processing runs
- THEN Raw OCR SHALL use the original image
- AND text detection quality SHALL not be affected by preprocessing
Scenario: Preprocessed image is temporary
- GIVEN an image is preprocessed for layout detection
- WHEN layout detection completes
- THEN the preprocessed image SHALL NOT be persisted to storage
- AND only the original image and element crops SHALL be saved
Requirement: Preprocessing Frontend UI
The frontend SHALL provide a user interface for configuring and previewing preprocessing settings.
Scenario: Mode selection is available
- GIVEN the user is configuring OCR track processing
- WHEN the preprocessing settings panel is displayed
- THEN the user SHALL be able to select mode: Auto (default), Manual, or Disabled
- AND Auto mode SHALL be pre-selected
Scenario: Manual mode shows configuration options
- GIVEN the user selects Manual mode
- WHEN the settings panel updates
- THEN the user SHALL see options for:
- Contrast enhancement (None / Histogram / CLAHE)
- Sharpen toggle
- Binarize toggle
Scenario: Preview button triggers comparison view
- GIVEN preprocessing settings are configured
- WHEN the user clicks Preview button
- THEN the system SHALL display side-by-side comparison of original and preprocessed images
- AND show detected quality metrics