21bc2f92f12d4e9920ebb8068f93621e570602a4
Complete redesign of frontend interface with focus on usability, visual hierarchy, and professional appearance: **Design System:** - Implemented clean blue color theme (#3B82F6) with professional palette - Created consistent spacing, shadows, and typography system - Added reusable utility classes (page-header, section, status-badge-*) - Removed excessive gradients and decorative effects **Layout Architecture:** - Redesigned main layout with 256px sidebar navigation - Sidebar includes logo, navigation with descriptions, and user profile - Main content area with search bar and scrollable content - Replaced horizontal navigation with vertical sidebar pattern **Page Redesigns:** 1. LoginPage: Split-screen design with branding (left) and clean form (right) - Feature highlights with icons and statistics - Mobile responsive design - Professional gradient background with subtle pattern 2. UploadPage: Added 3-step visual progress indicator - Better file organization with summary and status badges - Clear action bar with confirmation message - Improved file list presentation 3. ProcessingPage: Enhanced progress visualization - Large progress bar with percentage display - 4-column stats grid (Completed, Processing, Failed, Total) - Clean file status list with processing times 4. ResultsPage: Improved 5-column layout (2 for list, 3 for preview) - Added stats cards for accuracy, processing time, and text blocks - Better preview panel with detailed metrics - Export and translate action buttons 5. ExportPage: Better organization with 2-column layout - Visual format selection with icons (TXT, JSON, Excel, Markdown, PDF) - Improved form controls and option organization - Sticky preview sidebar showing current configuration **Component Updates:** - Updated Button component with proper variants - Enhanced Card component with hover effects - Maintained FileUpload component functionality - Added lucide-react for modern iconography **Technical Improvements:** - Fixed Tailwind CSS v4 compatibility issues with @apply - Removed decorative animations in favor of functional ones - Improved accessibility with proper labels and ARIA attributes - Better color contrast and readability This redesign transforms the interface from a basic layout to a professional, enterprise-ready application with clear visual hierarchy and excellent usability. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Tool_OCR
OCR Batch Processing System with Structure Extraction
A web-based solution to extract text, images, and document structure from multiple files efficiently using PaddleOCR-VL.
Features
- 🔍 Multi-Language OCR: Support for 109 languages (Chinese, English, Japanese, Korean, etc.)
- 📄 Document Structure Analysis: Intelligent layout analysis with PP-StructureV3
- 🖼️ Image Extraction: Preserve document images alongside text content
- 📑 Batch Processing: Process multiple files concurrently with progress tracking
- 📤 Multiple Export Formats: TXT, JSON, Excel, Markdown with images, searchable PDF
- 🔧 Flexible Configuration: Rule-based output formatting
- 🌐 Translation Ready: Reserved architecture for future translation features
Tech Stack
Backend
- Framework: FastAPI 0.115.0
- OCR Engine: PaddleOCR 3.0+ with PaddleOCR-VL
- Database: MySQL via SQLAlchemy
- PDF Generation: Pandoc + WeasyPrint
- Image Processing: OpenCV, Pillow, pdf2image
Frontend
- Framework: React 18 with Vite
- Styling: TailwindCSS + shadcn/ui
- HTTP Client: Axios with React Query
Prerequisites
- macOS: Apple Silicon (M1/M2/M3) or Intel
- Python: 3.10+
- Conda: Miniconda or Anaconda (will be installed automatically)
- Homebrew: For system dependencies
- MySQL: External database server (provided)
Installation
1. Automated Setup (Recommended)
# Clone the repository
cd /Users/egg/Projects/Tool_OCR
# Run automated setup script
chmod +x setup_conda.sh
./setup_conda.sh
# If Conda was just installed, reload your shell
source ~/.zshrc # or source ~/.bash_profile
# Run the script again to create environment
./setup_conda.sh
2. Install Dependencies
# Activate Conda environment
conda activate tool_ocr
# Install Python dependencies
pip install -r requirements.txt
# Install system dependencies (Pandoc for PDF generation)
brew install pandoc
# Install Chinese fonts for PDF generation (optional)
brew install --cask font-noto-sans-cjk
# Note: macOS built-in fonts work fine, this is optional
3. Download PaddleOCR Models
# Create models directory
mkdir -p models/paddleocr
# Models will be automatically downloaded on first run
# (~900MB total, includes PaddleOCR-VL 0.9B model)
4. Configure Environment
# Copy environment template
cp .env.example .env
# Edit .env with your settings
# Database credentials are pre-configured
nano .env
5. Initialize Database
# Database schema will be created automatically on first run
# Using: mysql.theaken.com:33306/db_A060
Usage
Start Backend Server
# Activate environment
conda activate tool_ocr
# Start FastAPI server
cd backend
python -m app.main
# Server runs at: http://localhost:12010
# API docs: http://localhost:12010/docs
Start Frontend (Coming Soon)
# Install frontend dependencies
cd frontend
npm install
# Start development server
npm run dev
# Frontend runs at: http://localhost:12011
Project Structure
Tool_OCR/
├── backend/
│ ├── app/
│ │ ├── api/v1/ # API endpoints
│ │ ├── core/ # Configuration, database
│ │ ├── models/ # Database models
│ │ ├── services/ # Business logic
│ │ ├── utils/ # Utilities
│ │ └── main.py # Application entry point
│ └── tests/ # Test suite
├── frontend/
│ └── src/ # React application
├── uploads/
│ ├── temp/ # Temporary uploads
│ ├── processed/ # Processed files
│ └── images/ # Extracted images
├── storage/
│ ├── markdown/ # Markdown outputs
│ ├── json/ # JSON results
│ └── exports/ # Export files
├── models/
│ └── paddleocr/ # PaddleOCR models
├── config/ # Configuration files
├── templates/ # PDF templates
├── logs/ # Application logs
├── requirements.txt # Python dependencies
├── setup_conda.sh # Environment setup script
├── .env.example # Environment template
└── README.md
API Endpoints (Planned)
POST /api/v1/ocr/upload- Upload files for OCR processingGET /api/v1/ocr/tasks- List all OCR tasksGET /api/v1/ocr/tasks/{task_id}- Get task detailsPOST /api/v1/ocr/batch- Create batch processing taskGET /api/v1/export/{task_id}- Export results (TXT/JSON/Excel/MD/PDF)POST /api/v1/translate/document- Translate document (reserved, returns 501)
Development
Run Tests
cd backend
pytest tests/ -v --cov=app
Code Quality
# Format code
black app/
# Lint code
pylint app/
OpenSpec Workflow
This project follows OpenSpec for specification-driven development:
# View current changes
openspec list
# Validate specifications
openspec validate add-ocr-batch-processing
# View implementation tasks
cat openspec/changes/add-ocr-batch-processing/tasks.md
Roadmap
- Phase 0: Environment setup and configuration
- Phase 1: Core OCR with structure extraction
- Phase 2: Frontend development
- Phase 3: Testing & optimization
- Phase 4: Deployment
- Phase 5: Translation feature (future)
License
[To be determined]
Contributors
- Development environment: macOS Apple Silicon
- Database: MySQL external server
- OCR Engine: PaddleOCR-VL 0.9B with PP-StructureV3
Support
For issues and questions, refer to:
- OpenSpec documentation:
openspec/AGENTS.md - Task breakdown:
openspec/changes/add-ocr-batch-processing/tasks.md - Specifications:
openspec/changes/add-ocr-batch-processing/specs/
Description
Languages
Python
84.1%
TypeScript
14.1%
Shell
1.4%
CSS
0.3%