feat: enable document orientation detection for scanned PDFs

- Enable PP-StructureV3's use_doc_orientation_classify feature
- Detect rotation angle from doc_preprocessor_res.angle
- Swap page dimensions (width <-> height) for 90°/270° rotations
- Output PDF now correctly displays landscape-scanned content

Also includes:
- Archive completed openspec proposals
- Add simplify-frontend-ocr-config proposal (pending)
- Code cleanup and frontend simplification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
egg
2025-12-11 17:13:46 +08:00
parent 57070af307
commit cfe65158a3
58 changed files with 1271 additions and 3048 deletions

View File

@@ -0,0 +1,141 @@
# Design: Simple Text Positioning
## Architecture
### Current Flow (Complex)
```
Raw OCR → PP-Structure Analysis → Table Detection → HTML Parsing →
Column Correction → Cell Positioning → PDF Generation
```
### New Flow (Simple)
```
Raw OCR → Text Region Extraction → Bbox Processing →
Rotation Calculation → Font Size Estimation → PDF Text Rendering
```
## Core Components
### 1. TextRegionRenderer
New service class to handle raw OCR text rendering:
```python
class TextRegionRenderer:
"""Render raw OCR text regions to PDF."""
def render_text_region(
self,
canvas: Canvas,
region: Dict,
scale_factor: float
) -> None:
"""
Render a single OCR text region.
Args:
canvas: ReportLab canvas
region: Raw OCR region with text and bbox
scale_factor: Coordinate scaling factor
"""
```
### 2. Bbox Processing
Raw OCR bbox format (quadrilateral - 4 corner points):
```json
{
"text": "LOCTITE",
"bbox": [[116, 76], [378, 76], [378, 128], [116, 128]],
"confidence": 0.98
}
```
Processing steps:
1. **Center point**: Average of 4 corners
2. **Width/Height**: Distance between corners
3. **Rotation angle**: Angle of top edge from horizontal
4. **Font size**: Approximate from bbox height
### 3. Rotation Calculation
```python
def calculate_rotation(bbox: List[List[float]]) -> float:
"""
Calculate text rotation from bbox quadrilateral.
Returns angle in degrees (counter-clockwise from horizontal).
"""
# Top-left to top-right vector
dx = bbox[1][0] - bbox[0][0]
dy = bbox[1][1] - bbox[0][1]
# Angle in degrees
angle = math.atan2(dy, dx) * 180 / math.pi
return angle
```
### 4. Font Size Estimation
```python
def estimate_font_size(bbox: List[List[float]], text: str) -> float:
"""
Estimate font size from bbox dimensions.
Uses bbox height as primary indicator, adjusted for aspect ratio.
"""
# Calculate bbox height (average of left and right edges)
left_height = math.dist(bbox[0], bbox[3])
right_height = math.dist(bbox[1], bbox[2])
avg_height = (left_height + right_height) / 2
# Font size is approximately 70-80% of bbox height
return avg_height * 0.75
```
## Integration Points
### PDFGeneratorService
Modify `draw_ocr_content()` to use simple text positioning:
```python
def draw_ocr_content(self, canvas, content_data, page_info):
"""Draw OCR content using simple text positioning."""
# Use raw OCR regions directly
raw_regions = content_data.get('raw_ocr_regions', [])
for region in raw_regions:
self.text_renderer.render_text_region(
canvas, region, scale_factor
)
```
### Configuration
Add config option to enable/disable simple mode:
```python
class OCRSettings:
simple_text_positioning: bool = Field(
default=True,
description="Use simple text positioning instead of table reconstruction"
)
```
## File Changes
| File | Change |
|------|--------|
| `app/services/text_region_renderer.py` | New - Text rendering logic |
| `app/services/pdf_generator_service.py` | Modify - Integration |
| `app/core/config.py` | Add - Configuration option |
## Edge Cases
1. **Overlapping text**: Regions may overlap slightly - render in reading order
2. **Very small text**: Minimum font size threshold (6pt)
3. **Rotated pages**: Handle 90/180/270 degree page rotation
4. **Empty regions**: Skip regions with empty text
5. **Unicode text**: Ensure font supports CJK characters