Files
OCR/openspec/changes/archive/2025-11-25-fix-pdf-coordinate-system/proposal.md
egg 2312b4cd66 feat: add frontend-adjustable PP-StructureV3 parameters with comprehensive testing
Implement user-configurable PP-StructureV3 parameters to allow fine-tuning OCR behavior
from the frontend. This addresses issues with over-merging, missing small text, and
document-specific optimization needs.

Backend:
- Add PPStructureV3Params schema with 7 adjustable parameters
- Update OCR service to accept custom parameters with smart caching
- Modify /tasks/{task_id}/start endpoint to receive params in request body
- Parameter priority: custom > settings default
- Conditional caching (no cache for custom params to avoid pollution)

Frontend:
- Create PPStructureParams component with collapsible UI
- Add 3 presets: default, high-quality, fast
- Implement localStorage persistence for user parameters
- Add import/export JSON functionality
- Integrate into ProcessingPage with conditional rendering

Testing:
- Unit tests: 7/10 passing (core functionality verified)
- API integration tests for schema validation
- E2E tests with authentication support
- Performance benchmarks for memory and initialization
- Test runner script with venv activation

Environment:
- Remove duplicate backend/venv (use root venv only)
- Update test runner to use correct virtual environment

OpenSpec:
- Archive fix-pdf-coordinate-system proposal
- Archive frontend-adjustable-ppstructure-params proposal
- Create ocr-processing spec
- Update result-export spec

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 14:39:19 +08:00

2.7 KiB

Change: Fix PDF Layout Restoration Coordinate System and Dimension Calculation

Why

During OCR track validation, the generated PDF (img1_layout.pdf) exhibits significant layout discrepancies compared to the original image (img1.png). Specific issues include:

  • Element position misalignment: Text elements appear at incorrect vertical positions
  • Abnormal vertical flipping: Coordinate transformation errors cause content to be inverted
  • Incorrect scaling: Content is stretched or compressed due to wrong page dimension calculations

Code review identified two critical logic defects in backend/app/services/pdf_generator_service.py:

  1. Page dimension calculation error: The system ignores explicit page dimensions from OCR results and instead infers dimensions from bounding box boundaries, causing coordinate transformation errors
  2. Missing multi-page support: The PDF generator only uses the first page's dimensions globally, unable to handle mixed orientation (portrait/landscape) or different-sized pages

These issues violate the requirement "Enhanced PDF Export with Layout Preservation" in the result-export specification, making PDF exports unreliable for production use.

What Changes

1. Fix calculate_page_dimensions Logic

  • MODIFIED: backend/app/services/pdf_generator_service.py::calculate_page_dimensions()
  • Change priority order: Check explicit dimensions field first, fallback to bbox calculation only when unavailable
  • Ensure Y-axis coordinate transformation uses correct page height

2. Implement Dynamic Per-Page Sizing

  • MODIFIED: backend/app/services/pdf_generator_service.py::_generate_direct_track_pdf()
  • MODIFIED: backend/app/services/pdf_generator_service.py::_generate_ocr_track_pdf()
  • Call pdf_canvas.setPageSize() for each page to support varying page dimensions
  • Pass current page height to coordinate transformation functions

3. Update OCR Data Converter

  • MODIFIED: backend/app/services/ocr_to_unified_converter.py::convert_unified_document_to_ocr_data()
  • Add page_dimensions mapping to output: {page_index: {width, height}}
  • Ensure OCR track has per-page dimension information

Impact

Affected specs: result-export (MODIFIED requirement: "Enhanced PDF Export with Layout Preservation")

Affected code:

  • backend/app/services/pdf_generator_service.py (core fix)
  • backend/app/services/ocr_to_unified_converter.py (data structure enhancement)

Breaking changes: None - this is a bug fix that makes existing functionality work correctly

Benefits:

  • Accurate layout restoration for single-page documents
  • Support for mixed-orientation multi-page documents
  • Correct coordinate transformation without vertical flipping errors
  • Improved reliability for PDF export feature