Files
OCR/openspec/changes/archive/2025-12-11-cleanup-dead-code/specs/document-processing/spec.md
egg cfe65158a3 feat: enable document orientation detection for scanned PDFs
- Enable PP-StructureV3's use_doc_orientation_classify feature
- Detect rotation angle from doc_preprocessor_res.angle
- Swap page dimensions (width <-> height) for 90°/270° rotations
- Output PDF now correctly displays landscape-scanned content

Also includes:
- Archive completed openspec proposals
- Add simplify-frontend-ocr-config proposal (pending)
- Code cleanup and frontend simplification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:13:46 +08:00

2.2 KiB

REMOVED Requirements

Requirement: Legacy PDF Generator Service

Reason: pdf_generator.py (507 lines) was the original PDF generation implementation using Pandoc/WeasyPrint. It has been completely superseded by pdf_generator_service.py which uses ReportLab for low-level PDF generation with full layout preservation, table rendering, and image support.

Migration: No migration needed. The new pdf_generator_service.py provides all functionality with improved features.

Scenario: Legacy PDF generator file removal

  • WHEN the legacy pdf_generator.py file is removed
  • THEN the system continues to function normally using pdf_generator_service.py
  • AND PDF generation works correctly with layout preservation
  • AND no import errors occur in any service or router

Requirement: Deprecated IoU Configuration Parameters

Reason: gap_filling_iou_threshold and gap_filling_dedup_iou_threshold are deprecated configuration parameters that should be replaced by IoA (Intersection over Area) thresholds for better accuracy.

Migration: Use gap_filling_dedup_ioa_threshold instead.

Scenario: Deprecated config removal

  • WHEN the deprecated IoU configuration parameters are removed from config.py
  • THEN gap filling service uses IoA-based thresholds
  • AND the system starts without configuration errors

ADDED Requirements

Requirement: Unified Bbox Utility Module

The system SHALL provide a centralized bbox utility module (backend/app/utils/bbox_utils.py) for consistent bounding box normalization across all services.

Scenario: Bbox normalization from polygon format

  • WHEN a bbox in polygon format [[x1,y1], [x2,y2], [x3,y3], [x4,y4]] is provided
  • THEN the utility returns normalized tuple (x0, y0, x1, y1) representing min/max coordinates

Scenario: Bbox normalization from flat array

  • WHEN a bbox in flat array format [x0, y0, x1, y1] is provided
  • THEN the utility returns normalized tuple (x0, y0, x1, y1)

Scenario: Bbox normalization from 8-point polygon

  • WHEN a bbox in 8-point format [x1, y1, x2, y2, x3, y3, x4, y4] is provided
  • THEN the utility calculates and returns normalized tuple (min_x, min_y, max_x, max_y)