Files
OCR/openspec/changes/archive/2025-12-11-simple-text-positioning/design.md
egg cfe65158a3 feat: enable document orientation detection for scanned PDFs
- Enable PP-StructureV3's use_doc_orientation_classify feature
- Detect rotation angle from doc_preprocessor_res.angle
- Swap page dimensions (width <-> height) for 90°/270° rotations
- Output PDF now correctly displays landscape-scanned content

Also includes:
- Archive completed openspec proposals
- Add simplify-frontend-ocr-config proposal (pending)
- Code cleanup and frontend simplification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:13:46 +08:00

3.5 KiB

Design: Simple Text Positioning

Architecture

Current Flow (Complex)

Raw OCR → PP-Structure Analysis → Table Detection → HTML Parsing →
Column Correction → Cell Positioning → PDF Generation

New Flow (Simple)

Raw OCR → Text Region Extraction → Bbox Processing →
Rotation Calculation → Font Size Estimation → PDF Text Rendering

Core Components

1. TextRegionRenderer

New service class to handle raw OCR text rendering:

class TextRegionRenderer:
    """Render raw OCR text regions to PDF."""

    def render_text_region(
        self,
        canvas: Canvas,
        region: Dict,
        scale_factor: float
    ) -> None:
        """
        Render a single OCR text region.

        Args:
            canvas: ReportLab canvas
            region: Raw OCR region with text and bbox
            scale_factor: Coordinate scaling factor
        """

2. Bbox Processing

Raw OCR bbox format (quadrilateral - 4 corner points):

{
  "text": "LOCTITE",
  "bbox": [[116, 76], [378, 76], [378, 128], [116, 128]],
  "confidence": 0.98
}

Processing steps:

  1. Center point: Average of 4 corners
  2. Width/Height: Distance between corners
  3. Rotation angle: Angle of top edge from horizontal
  4. Font size: Approximate from bbox height

3. Rotation Calculation

def calculate_rotation(bbox: List[List[float]]) -> float:
    """
    Calculate text rotation from bbox quadrilateral.

    Returns angle in degrees (counter-clockwise from horizontal).
    """
    # Top-left to top-right vector
    dx = bbox[1][0] - bbox[0][0]
    dy = bbox[1][1] - bbox[0][1]

    # Angle in degrees
    angle = math.atan2(dy, dx) * 180 / math.pi
    return angle

4. Font Size Estimation

def estimate_font_size(bbox: List[List[float]], text: str) -> float:
    """
    Estimate font size from bbox dimensions.

    Uses bbox height as primary indicator, adjusted for aspect ratio.
    """
    # Calculate bbox height (average of left and right edges)
    left_height = math.dist(bbox[0], bbox[3])
    right_height = math.dist(bbox[1], bbox[2])
    avg_height = (left_height + right_height) / 2

    # Font size is approximately 70-80% of bbox height
    return avg_height * 0.75

Integration Points

PDFGeneratorService

Modify draw_ocr_content() to use simple text positioning:

def draw_ocr_content(self, canvas, content_data, page_info):
    """Draw OCR content using simple text positioning."""

    # Use raw OCR regions directly
    raw_regions = content_data.get('raw_ocr_regions', [])

    for region in raw_regions:
        self.text_renderer.render_text_region(
            canvas, region, scale_factor
        )

Configuration

Add config option to enable/disable simple mode:

class OCRSettings:
    simple_text_positioning: bool = Field(
        default=True,
        description="Use simple text positioning instead of table reconstruction"
    )

File Changes

File Change
app/services/text_region_renderer.py New - Text rendering logic
app/services/pdf_generator_service.py Modify - Integration
app/core/config.py Add - Configuration option

Edge Cases

  1. Overlapping text: Regions may overlap slightly - render in reading order
  2. Very small text: Minimum font size threshold (6pt)
  3. Rotated pages: Handle 90/180/270 degree page rotation
  4. Empty regions: Skip regions with empty text
  5. Unicode text: Ensure font supports CJK characters