- Removed all test files and directories - Deleted outdated documentation (will be rewritten) - Cleaned up temporary files, logs, and uploads - Archived 5 completed OpenSpec proposals - Created new dual-track-document-processing proposal with complete OpenSpec structure - Dual-track architecture: OCR track (PaddleOCR) + Direct track (PyMuPDF) - UnifiedDocument model for consistent output - Support for structure-preserving translation - Updated .gitignore to prevent future test/temp files This is a major cleanup preparing for the complete refactoring of the document processing pipeline. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2.0 KiB
File Processing Specification Delta
ADDED Requirements
Requirement: Office Document Support
The system SHALL support processing of Microsoft Office document formats including Word documents (.doc, .docx) and PowerPoint presentations (.ppt, .pptx).
Scenario: Upload and Process Word Document
Given a user has a Word document containing text and tables
When the user uploads the .docx file
Then the system converts it to PDF format
And extracts all text using OCR
And preserves table structure in the output
Scenario: Upload and Process PowerPoint
Given a user has a PowerPoint presentation with multiple slides
When the user uploads the .pptx file
Then the system converts each slide to an image
And performs OCR on each slide
And maintains slide order in the results
Requirement: Document Conversion Pipeline
The system SHALL implement a multi-stage conversion pipeline for Office documents using LibreOffice or equivalent tools.
Scenario: Conversion Error Handling
Given an Office document with unsupported features When the conversion process encounters an error Then the system logs the specific error details And returns a user-friendly error message And marks the file as failed with reason
MODIFIED Requirements
Requirement: File Validation
The file validation module SHALL accept Office document formats in addition to existing image and PDF formats, including .doc, .docx, .ppt, and .pptx extensions.
Scenario: Validate Office File Upload
Given a user attempts to upload a file
When the file extension is .docx or .pptx
Then the system accepts the file for processing
And validates the MIME type matches the extension
Requirement: JWT Token Validity
The JWT token validity period SHALL be extended from 30 minutes to 1440 minutes (24 hours) to improve user experience.
Scenario: Extended Token Usage
Given a user authenticates successfully When they receive a JWT token Then the token remains valid for 24 hours And allows continuous API access without re-authentication