# Design: Simple Text Positioning ## Architecture ### Current Flow (Complex) ``` Raw OCR → PP-Structure Analysis → Table Detection → HTML Parsing → Column Correction → Cell Positioning → PDF Generation ``` ### New Flow (Simple) ``` Raw OCR → Text Region Extraction → Bbox Processing → Rotation Calculation → Font Size Estimation → PDF Text Rendering ``` ## Core Components ### 1. TextRegionRenderer New service class to handle raw OCR text rendering: ```python class TextRegionRenderer: """Render raw OCR text regions to PDF.""" def render_text_region( self, canvas: Canvas, region: Dict, scale_factor: float ) -> None: """ Render a single OCR text region. Args: canvas: ReportLab canvas region: Raw OCR region with text and bbox scale_factor: Coordinate scaling factor """ ``` ### 2. Bbox Processing Raw OCR bbox format (quadrilateral - 4 corner points): ```json { "text": "LOCTITE", "bbox": [[116, 76], [378, 76], [378, 128], [116, 128]], "confidence": 0.98 } ``` Processing steps: 1. **Center point**: Average of 4 corners 2. **Width/Height**: Distance between corners 3. **Rotation angle**: Angle of top edge from horizontal 4. **Font size**: Approximate from bbox height ### 3. Rotation Calculation ```python def calculate_rotation(bbox: List[List[float]]) -> float: """ Calculate text rotation from bbox quadrilateral. Returns angle in degrees (counter-clockwise from horizontal). """ # Top-left to top-right vector dx = bbox[1][0] - bbox[0][0] dy = bbox[1][1] - bbox[0][1] # Angle in degrees angle = math.atan2(dy, dx) * 180 / math.pi return angle ``` ### 4. Font Size Estimation ```python def estimate_font_size(bbox: List[List[float]], text: str) -> float: """ Estimate font size from bbox dimensions. Uses bbox height as primary indicator, adjusted for aspect ratio. """ # Calculate bbox height (average of left and right edges) left_height = math.dist(bbox[0], bbox[3]) right_height = math.dist(bbox[1], bbox[2]) avg_height = (left_height + right_height) / 2 # Font size is approximately 70-80% of bbox height return avg_height * 0.75 ``` ## Integration Points ### PDFGeneratorService Modify `draw_ocr_content()` to use simple text positioning: ```python def draw_ocr_content(self, canvas, content_data, page_info): """Draw OCR content using simple text positioning.""" # Use raw OCR regions directly raw_regions = content_data.get('raw_ocr_regions', []) for region in raw_regions: self.text_renderer.render_text_region( canvas, region, scale_factor ) ``` ### Configuration Add config option to enable/disable simple mode: ```python class OCRSettings: simple_text_positioning: bool = Field( default=True, description="Use simple text positioning instead of table reconstruction" ) ``` ## File Changes | File | Change | |------|--------| | `app/services/text_region_renderer.py` | New - Text rendering logic | | `app/services/pdf_generator_service.py` | Modify - Integration | | `app/core/config.py` | Add - Configuration option | ## Edge Cases 1. **Overlapping text**: Regions may overlap slightly - render in reading order 2. **Very small text**: Minimum font size threshold (6pt) 3. **Rotated pages**: Handle 90/180/270 degree page rotation 4. **Empty regions**: Skip regions with empty text 5. **Unicode text**: Ensure font supports CJK characters