- Enable PP-StructureV3's use_doc_orientation_classify feature - Detect rotation angle from doc_preprocessor_res.angle - Swap page dimensions (width <-> height) for 90°/270° rotations - Output PDF now correctly displays landscape-scanned content Also includes: - Archive completed openspec proposals - Add simplify-frontend-ocr-config proposal (pending) - Code cleanup and frontend simplification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.5 KiB
3.5 KiB
Design: Simple Text Positioning
Architecture
Current Flow (Complex)
Raw OCR → PP-Structure Analysis → Table Detection → HTML Parsing →
Column Correction → Cell Positioning → PDF Generation
New Flow (Simple)
Raw OCR → Text Region Extraction → Bbox Processing →
Rotation Calculation → Font Size Estimation → PDF Text Rendering
Core Components
1. TextRegionRenderer
New service class to handle raw OCR text rendering:
class TextRegionRenderer:
"""Render raw OCR text regions to PDF."""
def render_text_region(
self,
canvas: Canvas,
region: Dict,
scale_factor: float
) -> None:
"""
Render a single OCR text region.
Args:
canvas: ReportLab canvas
region: Raw OCR region with text and bbox
scale_factor: Coordinate scaling factor
"""
2. Bbox Processing
Raw OCR bbox format (quadrilateral - 4 corner points):
{
"text": "LOCTITE",
"bbox": [[116, 76], [378, 76], [378, 128], [116, 128]],
"confidence": 0.98
}
Processing steps:
- Center point: Average of 4 corners
- Width/Height: Distance between corners
- Rotation angle: Angle of top edge from horizontal
- Font size: Approximate from bbox height
3. Rotation Calculation
def calculate_rotation(bbox: List[List[float]]) -> float:
"""
Calculate text rotation from bbox quadrilateral.
Returns angle in degrees (counter-clockwise from horizontal).
"""
# Top-left to top-right vector
dx = bbox[1][0] - bbox[0][0]
dy = bbox[1][1] - bbox[0][1]
# Angle in degrees
angle = math.atan2(dy, dx) * 180 / math.pi
return angle
4. Font Size Estimation
def estimate_font_size(bbox: List[List[float]], text: str) -> float:
"""
Estimate font size from bbox dimensions.
Uses bbox height as primary indicator, adjusted for aspect ratio.
"""
# Calculate bbox height (average of left and right edges)
left_height = math.dist(bbox[0], bbox[3])
right_height = math.dist(bbox[1], bbox[2])
avg_height = (left_height + right_height) / 2
# Font size is approximately 70-80% of bbox height
return avg_height * 0.75
Integration Points
PDFGeneratorService
Modify draw_ocr_content() to use simple text positioning:
def draw_ocr_content(self, canvas, content_data, page_info):
"""Draw OCR content using simple text positioning."""
# Use raw OCR regions directly
raw_regions = content_data.get('raw_ocr_regions', [])
for region in raw_regions:
self.text_renderer.render_text_region(
canvas, region, scale_factor
)
Configuration
Add config option to enable/disable simple mode:
class OCRSettings:
simple_text_positioning: bool = Field(
default=True,
description="Use simple text positioning instead of table reconstruction"
)
File Changes
| File | Change |
|---|---|
app/services/text_region_renderer.py |
New - Text rendering logic |
app/services/pdf_generator_service.py |
Modify - Integration |
app/core/config.py |
Add - Configuration option |
Edge Cases
- Overlapping text: Regions may overlap slightly - render in reading order
- Very small text: Minimum font size threshold (6pt)
- Rotated pages: Handle 90/180/270 degree page rotation
- Empty regions: Skip regions with empty text
- Unicode text: Ensure font supports CJK characters