Files
OCR/openspec/changes/archive/2025-11-25-fix-pdf-coordinate-system/tasks.md
egg 2312b4cd66 feat: add frontend-adjustable PP-StructureV3 parameters with comprehensive testing
Implement user-configurable PP-StructureV3 parameters to allow fine-tuning OCR behavior
from the frontend. This addresses issues with over-merging, missing small text, and
document-specific optimization needs.

Backend:
- Add PPStructureV3Params schema with 7 adjustable parameters
- Update OCR service to accept custom parameters with smart caching
- Modify /tasks/{task_id}/start endpoint to receive params in request body
- Parameter priority: custom > settings default
- Conditional caching (no cache for custom params to avoid pollution)

Frontend:
- Create PPStructureParams component with collapsible UI
- Add 3 presets: default, high-quality, fast
- Implement localStorage persistence for user parameters
- Add import/export JSON functionality
- Integrate into ProcessingPage with conditional rendering

Testing:
- Unit tests: 7/10 passing (core functionality verified)
- API integration tests for schema validation
- E2E tests with authentication support
- Performance benchmarks for memory and initialization
- Test runner script with venv activation

Environment:
- Remove duplicate backend/venv (use root venv only)
- Update test runner to use correct virtual environment

OpenSpec:
- Archive fix-pdf-coordinate-system proposal
- Archive frontend-adjustable-ppstructure-params proposal
- Create ocr-processing spec
- Update result-export spec

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 14:39:19 +08:00

2.6 KiB

Implementation Tasks

1. Fix Page Dimension Calculation

  • 1.1 Modify calculate_page_dimensions() in pdf_generator_service.py
    • Add priority check for ocr_dimensions field first
    • Add fallback check for dimensions field
    • Keep bbox calculation as final fallback only
    • Add logging to show which dimension source is used
  • 1.2 Add unit tests for dimension calculation logic
    • Test with explicit dimensions provided
    • Test with missing dimensions (fallback to bbox)
    • Test edge cases (empty content, single element)

2. Implement Dynamic Per-Page Sizing for Direct Track

  • 2.1 Refactor _generate_direct_track_pdf() loop
    • Extract current page dimensions inside loop
    • Call pdf_canvas.setPageSize() for each page
    • Pass current page_height to all drawing functions
  • 2.2 Update drawing helper functions
    • Ensure _draw_text_element_direct() receives page_height parameter
    • Ensure _draw_image_element() receives page_height parameter
    • Ensure _draw_table_element() receives page_height parameter

3. Implement Dynamic Per-Page Sizing for OCR Track

  • 3.1 Enhance convert_unified_document_to_ocr_data()
    • Add page_dimensions field to output dict
    • Map each page index to its dimensions: {0: {width: X, height: Y}, ...}
    • Include ocr_dimensions field for backward compatibility
  • 3.2 Refactor _generate_ocr_track_pdf() loop
    • Read dimensions from page_dimensions[page_num]
    • Call pdf_canvas.setPageSize() for each page
    • Pass current page_height to coordinate transformation

4. Testing & Validation

  • 4.1 Single-page layout verification
    • Process img1.png through OCR track
    • Verify generated PDF text positions match original image
    • Confirm no vertical flipping or offset issues
    • Check "D" header appears at correct top position
  • 4.2 Multi-page mixed orientation test
    • Create test PDF with portrait and landscape pages
    • Process through both OCR and Direct tracks
    • Verify each page uses correct dimensions
    • Confirm no content clipping or misalignment
  • 4.3 Regression testing
    • Run existing PDF generation tests
    • Verify Direct track StyleInfo preservation
    • Check table rendering still works correctly
    • Ensure image extraction positions are correct

5. Documentation

  • 5.1 Update code comments in pdf_generator_service.py
  • 5.2 Document coordinate transformation logic
  • 5.3 Add inline examples for multi-page handling