egg/OCR

Files

egg 2312b4cd66 feat: add frontend-adjustable PP-StructureV3 parameters with comprehensive testing

Implement user-configurable PP-StructureV3 parameters to allow fine-tuning OCR behavior
from the frontend. This addresses issues with over-merging, missing small text, and
document-specific optimization needs.

Backend:
- Add PPStructureV3Params schema with 7 adjustable parameters
- Update OCR service to accept custom parameters with smart caching
- Modify /tasks/{task_id}/start endpoint to receive params in request body
- Parameter priority: custom > settings default
- Conditional caching (no cache for custom params to avoid pollution)

Frontend:
- Create PPStructureParams component with collapsible UI
- Add 3 presets: default, high-quality, fast
- Implement localStorage persistence for user parameters
- Add import/export JSON functionality
- Integrate into ProcessingPage with conditional rendering

Testing:
- Unit tests: 7/10 passing (core functionality verified)
- API integration tests for schema validation
- E2E tests with authentication support
- Performance benchmarks for memory and initialization
- Test runner script with venv activation

Environment:
- Remove duplicate backend/venv (use root venv only)
- Update test runner to use correct virtual environment

OpenSpec:
- Archive fix-pdf-coordinate-system proposal
- Archive frontend-adjustable-ppstructure-params proposal
- Create ocr-processing spec
- Update result-export spec

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-25 14:39:19 +08:00

5.9 KiB

Raw Blame History

result-export Specification

Purpose

TBD - created by archiving change fix-v2-api-ui-issues. Update Purpose after archive.

Requirements

Requirement: Export Interface

The Export page SHALL support downloading OCR results in multiple formats using V2 task APIs, with processing track information and enhanced structure data.

Scenario: Export page uses V2 download endpoints

WHEN user selects a format and clicks export button
THEN frontend SHALL call V2 endpoint /api/v2/tasks/{task_id}/download/{format}
AND frontend SHALL NOT call V1 /api/v2/export endpoint (which returns 404)
AND file SHALL download successfully

Scenario: Export supports multiple formats

WHEN user exports a completed task
THEN system SHALL support downloading as TXT, JSON, Excel, Markdown, and PDF
AND each format SHALL use correct V2 download endpoint
AND downloaded files SHALL contain task OCR results

Scenario: Export includes processing track metadata

WHEN user exports a task processed through dual-track system
THEN exported JSON SHALL include "processing_track" field indicating "ocr" or "direct"
AND SHALL include "processing_metadata" with track-specific information
AND SHALL maintain backward compatibility for clients not expecting these fields

Scenario: Export UnifiedDocument format

WHEN user requests JSON export with unified=true parameter
THEN system SHALL return UnifiedDocument structure
AND include complete element hierarchy with coordinates
AND preserve all PP-StructureV3 element types for OCR track

Requirement: Multi-Task Export Selection

The Export page SHALL allow users to select and export multiple tasks.

Scenario: Select multiple tasks for export

WHEN Export page loads
THEN page SHALL display list of user's completed tasks
AND page SHALL provide checkboxes to select multiple tasks
AND page SHALL NOT require batch ID from upload store (legacy V1 behavior)

Scenario: Export selected tasks

WHEN user selects multiple tasks and clicks export
THEN system SHALL download each selected task's results in chosen format
AND downloaded files SHALL be named distinctly (e.g., {task_id}_result.{ext})
AND system MAY provide option to download as ZIP archive for multiple files

Requirement: Export Configuration Persistence

Export settings (format, thresholds, templates) SHALL apply consistently to V2 task downloads.

Scenario: Apply confidence threshold to export

WHEN user sets confidence threshold to 0.7 and exports
THEN downloaded results SHALL only include OCR text with confidence >= 0.7
AND threshold SHALL apply via V2 download endpoint query parameters

Scenario: Apply CSS template to PDF export

WHEN user selects CSS template for PDF format
THEN downloaded PDF SHALL use selected styling
AND template SHALL be passed to V2 /tasks/{id}/download/pdf endpoint

Requirement: Enhanced PDF Export with Layout Preservation

The PDF export SHALL accurately preserve document layout from both OCR and direct extraction tracks with correct coordinate transformation and multi-page support.

Scenario: Export PDF from direct extraction track

WHEN exporting PDF from a direct-extraction processed document
THEN the PDF SHALL maintain exact text positioning from source
AND preserve original fonts and styles where possible
AND include extracted images at correct positions

Scenario: Export PDF from OCR track with full structure

WHEN exporting PDF from OCR-processed document
THEN the PDF SHALL use all 23 PP-StructureV3 element types
AND render tables with proper cell boundaries
AND maintain reading order from parsing_res_list

Scenario: Handle coordinate transformations correctly

WHEN generating PDF from UnifiedDocument
THEN system SHALL use explicit page dimensions from OCR results (not inferred from bounding boxes)
AND correctly transform Y-axis coordinates from top-left (OCR) to bottom-left (PDF/ReportLab) origin
AND prevent vertical flipping or position misalignment errors
AND handle page size variations accurately

Scenario: Support multi-page documents with varying dimensions

WHEN generating PDF from multi-page document with mixed orientations
THEN system SHALL apply correct page size for each page independently
AND support both portrait and landscape pages in same document
AND NOT use first page dimensions for all subsequent pages
AND call setPageSize() for each new page before rendering content

Scenario: Single-page layout verification

WHEN user exports OCR-processed single-page document (e.g., img1.png)
THEN generated PDF text positions SHALL match original image coordinates
AND top-aligned text (e.g., headers) SHALL appear at correct vertical position
AND no content SHALL be vertically flipped or offset from expected position

Requirement: Structure Data Export

The system SHALL provide export formats that preserve document structure for downstream processing.

5.9 KiB

Raw Blame History

result-export Specification

Purpose

Requirements

Requirement: Export Interface

Scenario: Export page uses V2 download endpoints

Scenario: Export supports multiple formats

Scenario: Export includes processing track metadata

Scenario: Export UnifiedDocument format

Requirement: Multi-Task Export Selection

Scenario: Select multiple tasks for export

Scenario: Export selected tasks

Requirement: Export Configuration Persistence

Scenario: Apply confidence threshold to export

Scenario: Apply CSS template to PDF export

Requirement: Enhanced PDF Export with Layout Preservation

Scenario: Export PDF from direct extraction track

Scenario: Export PDF from OCR track with full structure

Scenario: Handle coordinate transformations correctly

Scenario: Support multi-page documents with varying dimensions

Scenario: Single-page layout verification

Requirement: Structure Data Export

Scenario: Export structured JSON with hierarchy

Scenario: Export for translation preparation

Scenario: Export with layout analysis

5.9 KiB Raw Blame History

result-export Specification

Purpose

Requirements

Requirement: Export Interface

Scenario: Export page uses V2 download endpoints

Scenario: Export supports multiple formats

Scenario: Export includes processing track metadata

Scenario: Export UnifiedDocument format

Requirement: Multi-Task Export Selection

Scenario: Select multiple tasks for export

Scenario: Export selected tasks

Requirement: Export Configuration Persistence

Scenario: Apply confidence threshold to export

Scenario: Apply CSS template to PDF export

Requirement: Enhanced PDF Export with Layout Preservation

Scenario: Export PDF from direct extraction track

Scenario: Export PDF from OCR track with full structure

Scenario: Handle coordinate transformations correctly

Scenario: Support multi-page documents with varying dimensions

Scenario: Single-page layout verification

Requirement: Structure Data Export

Scenario: Export structured JSON with hierarchy

Scenario: Export for translation preparation

Scenario: Export with layout analysis

5.9 KiB

Raw Blame History