- Removed all test files and directories - Deleted outdated documentation (will be rewritten) - Cleaned up temporary files, logs, and uploads - Archived 5 completed OpenSpec proposals - Created new dual-track-document-processing proposal with complete OpenSpec structure - Dual-track architecture: OCR track (PaddleOCR) + Direct track (PyMuPDF) - UnifiedDocument model for consistent output - Support for structure-preserving translation - Updated .gitignore to prevent future test/temp files This is a major cleanup preparing for the complete refactoring of the document processing pipeline. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2.9 KiB
2.9 KiB
Result Export - Delta Changes
ADDED Requirements
Requirement: Image Extraction and Persistence
The OCR system SHALL save extracted images to disk during layout analysis for later use in PDF generation.
Scenario: Images extracted by PP-StructureV3 are saved to disk
- WHEN OCR processes a document containing images (charts, tables, figures)
- THEN system SHALL extract image objects from
markdown_imagesdictionary - AND system SHALL create
imgs/subdirectory in result folder - AND system SHALL save each image object to disk using PIL Image.save()
- AND saved file paths SHALL match paths recorded in JSON
images_metadata - AND system SHALL log warnings for failed image saves but continue processing
Scenario: Multi-page documents with images on different pages
- WHEN OCR processes multi-page PDF with images on multiple pages
- THEN system SHALL save images from all pages to same
imgs/folder - AND image filenames SHALL include bbox coordinates for uniqueness
- AND images SHALL be available for PDF generation after OCR completes
Requirement: Layout-Preserving PDF Generation
The system SHALL generate PDF files that preserve the original document layout using OCR JSON data.
Scenario: PDF generated from JSON with accurate layout
- WHEN user requests PDF download for a completed task
- THEN system SHALL parse OCR JSON result file
- AND system SHALL extract bounding box coordinates for each text region
- AND system SHALL determine page dimensions from source file or bbox maximum values
- AND system SHALL generate PDF with text positioned at precise coordinates
- AND system SHALL use Chinese-compatible font (e.g., Noto Sans CJK)
- AND system SHALL embed images from
imgs/folder using paths inimages_metadata - AND generated PDF SHALL visually resemble original document layout with images
Scenario: PDF download works correctly
- WHEN user clicks PDF download button
- THEN system SHALL return cached PDF if already generated
- OR system SHALL generate new PDF from JSON on first request
- AND system SHALL NOT return 403 Forbidden error
- AND downloaded PDF SHALL contain task OCR results with layout preserved
Scenario: Multi-page PDF generation
- WHEN OCR JSON contains results for multiple pages
- THEN generated PDF SHALL contain same number of pages
- AND each page SHALL display text regions for that page only
- AND page dimensions SHALL match original document pages
MODIFIED Requirements
Requirement: Export Interface
The Export page SHALL support downloading OCR results in multiple formats using V2 task APIs.
Scenario: PDF caching improves performance
- WHEN user downloads same PDF multiple times
- THEN system SHALL serve cached PDF file on subsequent requests
- AND system SHALL NOT regenerate PDF unless JSON changes
- AND download response time SHALL be faster than initial generation