Files
OCR/openspec/changes/archive/2025-12-11-cleanup-dead-code/specs/document-processing/spec.md
egg cfe65158a3 feat: enable document orientation detection for scanned PDFs
- Enable PP-StructureV3's use_doc_orientation_classify feature
- Detect rotation angle from doc_preprocessor_res.angle
- Swap page dimensions (width <-> height) for 90°/270° rotations
- Output PDF now correctly displays landscape-scanned content

Also includes:
- Archive completed openspec proposals
- Add simplify-frontend-ocr-config proposal (pending)
- Code cleanup and frontend simplification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 17:13:46 +08:00

43 lines
2.2 KiB
Markdown

## REMOVED Requirements
### Requirement: Legacy PDF Generator Service
**Reason**: `pdf_generator.py` (507 lines) was the original PDF generation implementation using Pandoc/WeasyPrint. It has been completely superseded by `pdf_generator_service.py` which uses ReportLab for low-level PDF generation with full layout preservation, table rendering, and image support.
**Migration**: No migration needed. The new `pdf_generator_service.py` provides all functionality with improved features.
#### Scenario: Legacy PDF generator file removal
- **WHEN** the legacy `pdf_generator.py` file is removed
- **THEN** the system continues to function normally using `pdf_generator_service.py`
- **AND** PDF generation works correctly with layout preservation
- **AND** no import errors occur in any service or router
### Requirement: Deprecated IoU Configuration Parameters
**Reason**: `gap_filling_iou_threshold` and `gap_filling_dedup_iou_threshold` are deprecated configuration parameters that should be replaced by IoA (Intersection over Area) thresholds for better accuracy.
**Migration**: Use `gap_filling_dedup_ioa_threshold` instead.
#### Scenario: Deprecated config removal
- **WHEN** the deprecated IoU configuration parameters are removed from config.py
- **THEN** gap filling service uses IoA-based thresholds
- **AND** the system starts without configuration errors
## ADDED Requirements
### Requirement: Unified Bbox Utility Module
The system SHALL provide a centralized bbox utility module (`backend/app/utils/bbox_utils.py`) for consistent bounding box normalization across all services.
#### Scenario: Bbox normalization from polygon format
- **WHEN** a bbox in polygon format `[[x1,y1], [x2,y2], [x3,y3], [x4,y4]]` is provided
- **THEN** the utility returns normalized tuple `(x0, y0, x1, y1)` representing min/max coordinates
#### Scenario: Bbox normalization from flat array
- **WHEN** a bbox in flat array format `[x0, y0, x1, y1]` is provided
- **THEN** the utility returns normalized tuple `(x0, y0, x1, y1)`
#### Scenario: Bbox normalization from 8-point polygon
- **WHEN** a bbox in 8-point format `[x1, y1, x2, y2, x3, y3, x4, y4]` is provided
- **THEN** the utility calculates and returns normalized tuple `(min_x, min_y, max_x, max_y)`