Implement user-configurable PP-StructureV3 parameters to allow fine-tuning OCR behavior
from the frontend. This addresses issues with over-merging, missing small text, and
document-specific optimization needs.
Backend:
- Add PPStructureV3Params schema with 7 adjustable parameters
- Update OCR service to accept custom parameters with smart caching
- Modify /tasks/{task_id}/start endpoint to receive params in request body
- Parameter priority: custom > settings default
- Conditional caching (no cache for custom params to avoid pollution)
Frontend:
- Create PPStructureParams component with collapsible UI
- Add 3 presets: default, high-quality, fast
- Implement localStorage persistence for user parameters
- Add import/export JSON functionality
- Integrate into ProcessingPage with conditional rendering
Testing:
- Unit tests: 7/10 passing (core functionality verified)
- API integration tests for schema validation
- E2E tests with authentication support
- Performance benchmarks for memory and initialization
- Test runner script with venv activation
Environment:
- Remove duplicate backend/venv (use root venv only)
- Update test runner to use correct virtual environment
OpenSpec:
- Archive fix-pdf-coordinate-system proposal
- Archive frontend-adjustable-ppstructure-params proposal
- Create ocr-processing spec
- Update result-export spec
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
6.8 KiB
6.8 KiB
Implementation Tasks
1. Backend Schema (✅ COMPLETED)
- 1.1 Define
PPStructureV3Paramsschema inbackend/app/schemas/task.py- Add 7 parameter fields with validation
- Set appropriate constraints (ge, le, gt, pattern)
- Add descriptive documentation
- 1.2 Update
ProcessingOptionsschema- Add optional
pp_structure_paramsfield - Ensure backward compatibility
- Add optional
2. Backend OCR Service Implementation
- 2.1 Modify
backend/app/services/ocr_service.py- Update
_ensure_structure_engine()method signature- Add
custom_params: Optional[Dict[str, Any]] = Noneparameter - Implement parameter priority logic (custom > settings)
- Conditional caching (skip cache for custom params)
- Add
- Update
process_image()method- Add
pp_structure_paramsparameter - Pass params to
_ensure_structure_engine()
- Add
- Update
process_with_dual_track()method- Add
pp_structure_paramsparameter - Forward params to OCR track processing
- Add
- Update main
process()method- Add
pp_structure_paramsparameter - Ensure params flow through all code paths
- Add
- Update
- 2.2 Add parameter logging
- Log when custom params are used
- Log parameter values for debugging
- Add performance metrics for custom vs default
3. Backend API Endpoint Updates
- 3.1 Modify
backend/app/routers/tasks.py- Update
start_taskendpoint- Accept
ProcessingOptionsas request body (not query params) - Extract
pp_structure_paramsfrom options - Convert to dict using
model_dump(exclude_none=True) - Pass to OCR service
- Accept
- Update
analyze_documentendpoint (if needed)- Support PP-StructureV3 params for analysis
- Update
- 3.2 Update API documentation
- Add OpenAPI schema for new parameters
- Include parameter descriptions and ranges
4. Frontend TypeScript Types
- 4.1 Update
frontend/src/types/apiV2.ts- Define
PPStructureV3Paramsinterfaceexport interface PPStructureV3Params { layout_detection_threshold?: number layout_nms_threshold?: number layout_merge_bboxes_mode?: 'union' | 'large' | 'small' layout_unclip_ratio?: number text_det_thresh?: number text_det_box_thresh?: number text_det_unclip_ratio?: number } - Update
ProcessingOptionsinterface- Add
pp_structure_params?: PPStructureV3Params
- Add
- Define
5. Frontend API Client Updates
- 5.1 Modify
frontend/src/services/apiV2.ts- Update
startTask()method- Change from query params to request body
- Send full
ProcessingOptionsobject
async startTask(taskId: string, options?: ProcessingOptions): Promise<Task> { const response = await this.client.post<Task>( `/tasks/${taskId}/start`, options // Send as body, not query params ) return response.data }
- Update
6. Frontend UI Implementation
- 6.1 Create parameter adjustment component
- Create
frontend/src/components/PPStructureParams.tsx- Slider components for numeric parameters
- Select dropdown for merge mode
- Help tooltips for each parameter
- Reset to defaults button
- Create
- 6.2 Add preset configurations
- Default mode (use backend defaults)
- High Quality mode (lower thresholds)
- Fast mode (higher thresholds)
- Custom mode (show all sliders)
- 6.3 Integrate into task processing flow
- Add to
ProcessingPage.tsx - Show only when task is pending
- Store params in component state
- Pass params to
startTask()API call
- Add to
7. Frontend UI/UX Polish
- 7.1 Add visual feedback
- Loading state while processing with custom params
- Success/error notifications with save confirmation
- Parameter value display (current vs default with highlight)
- 7.2 Add parameter persistence
- Save last used params to localStorage (auto-save on change)
- Create preset configurations (default, high-quality, fast)
- Import/export parameter configurations (JSON format)
- 7.3 Add help documentation
- Inline help text for each parameter with tooltips
- Descriptive labels explaining parameter effects
- Info panel explaining OCR track requirement
8. Testing
- 8.1 Backend unit tests
- Test schema validation (min/max, types, patterns)
- Test parameter passing through service layers
- Test caching behavior with custom params (no caching)
- Test parameter priority (custom > settings)
- Test fallback to defaults on error
- Test parameter flow through processing pipeline
- Test logging of custom parameters
- 8.2 API integration tests
- Test endpoint with various parameter combinations
- Test backward compatibility (no params)
- Test validation errors for invalid params (422 responses)
- Test partial parameter sets
- Test OpenAPI schema documentation
- Test parameter serialization/deserialization
- 8.3 Frontend component tests
- Test slider value changes
- Test preset selection
- Test API call generation
- 8.4 End-to-end tests
- Upload document → adjust params → process → verify results
- Test with different document types
- Compare results: default vs custom params
- 8.5 Performance tests
- Ensure no memory leaks with custom params
- Verify engine cleanup after processing
- Benchmark processing time impact
9. Documentation
- 9.1 Update API documentation
- Document new request body format
- Add parameter reference guide
- Include example requests
- 9.2 Create user guide
- When to adjust each parameter
- Common scenarios and recommended settings
- Troubleshooting guide
- 9.3 Update README
- Add feature description
- Include screenshots of UI
- Add configuration examples
10. Deployment & Rollout
- 10.1 Database migration (if needed)
- Store user parameter preferences
- Log parameter usage statistics
- 10.2 Feature flag (optional)
- Add feature toggle for gradual rollout
- Default to enabled
- 10.3 Monitoring
- Add metrics for parameter usage
- Track processing success rates by param config
- Monitor performance impact
Critical Path for Testing
Minimum required for frontend testing:
- ✅ Backend Schema (Section 1) - DONE
- Backend OCR Service (Section 2) - REQUIRED
- Backend API Endpoint (Section 3) - REQUIRED
- Frontend Types (Section 4) - REQUIRED
- Frontend API Client (Section 5) - REQUIRED
- Basic UI Component (Section 6.1-6.3) - REQUIRED
Nice to have but not blocking:
- UI Polish (Section 7)
- Full test suite (Section 8)
- Documentation (Section 9)
- Deployment features (Section 10)