Files
OCR/openspec/changes/archive/2025-12-11-simple-text-positioning/tasks.md
egg cfe65158a3 feat: enable document orientation detection for scanned PDFs
- Enable PP-StructureV3's use_doc_orientation_classify feature
- Detect rotation angle from doc_preprocessor_res.angle
- Swap page dimensions (width <-> height) for 90°/270° rotations
- Output PDF now correctly displays landscape-scanned content

Also includes:
- Archive completed openspec proposals
- Add simplify-frontend-ocr-config proposal (pending)
- Code cleanup and frontend simplification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:13:46 +08:00

2.5 KiB

Tasks: Simple Text Positioning

Phase 1: Core Implementation

  • Create TextRegionRenderer class in app/services/text_region_renderer.py
    • Implement calculate_rotation() from bbox quadrilateral
    • Implement estimate_font_size() from bbox height
    • Implement render_text_region() main method
    • Handle coordinate system transformation (OCR → PDF)

Phase 2: Integration

  • Add simple_text_positioning_enabled config option
  • Modify PDFGeneratorService._generate_ocr_track_pdf() to use TextRegionRenderer
  • Ensure raw OCR regions are loaded correctly via load_raw_ocr_regions()

Phase 3: Image/Chart/Formula Support

  • Add image element type detection (figure, image, chart, seal, formula)
  • Render image elements from UnifiedDocument to PDF
  • Handle image path resolution (result_dir, imgs/ subdirectory)
  • Coordinate transformation for image placement

Phase 4: Text Straightening & Overlap Avoidance

  • Add rotation straightening threshold (default 10°)
    • Small rotation angles (< 10°) are treated as 0° for clean output
    • Only significant rotations (e.g., 90°) are preserved
  • Add IoA (Intersection over Area) overlap detection
    • IoA threshold default 0.3 (30% overlap triggers skip)
    • Text regions overlapping with images/charts are skipped
  • Collect exclusion zones from image elements
  • Pass exclusion zones to text renderer

Phase 5: Chart Axis Label Deduplication

  • Add is_axis_label() method to detect axis labels
    • Y-axis: Vertical text immediately left of chart
    • X-axis: Horizontal text immediately below chart
  • Add is_near_zone() method for proximity checking
  • Position-aware deduplication in render_text_region()
    • Collect texts inside zones + axis labels
    • Skip matching text only if near zone or is axis label
    • Preserve matching text far from zones (e.g., table values)
  • Test results:
    • "Temperature, C" and "Syringe Thaw Time, Minutes" correctly skipped
    • Table values like "10" at top of page correctly rendered
    • Page 2: 128/148 text regions rendered (12 overlap + 8 dedupe)

Phase 6: Testing

  • Test with scan.pdf task (064e2d67-338c-4e54-b005-204c3b76fe63)
    • Page 2: Chart image rendered, axis labels deduplicated
    • PDF is searchable and selectable
    • Text is properly straightened (no skew artifacts)
  • Compare output quality vs original scan visually
  • Test with documents containing seals/formulas