egg/OCR

Go to file

beabigegg 57cf91271c feat: modernize frontend UI with Tailwind v4 and professional design system

BREAKING CHANGE: Migrated to Tailwind CSS v4 configuration system

Key Changes:
- Migrated from Tailwind v3 to v4 configuration system
  - Removed tailwind.config.js (incompatible with v4)
  - Updated index.css with @theme directive and oklch color space
  - Defined all custom animations directly in CSS using @keyframes

- Redesigned LoginPage with modern, enterprise-grade UI:
  - Full-screen gradient background (blue → purple → pink)
  - Floating animated orbs with blur effects
  - Glass morphism white card with backdrop-blur
  - Gradient buttons with shadow effects
  - 7 custom animations: fade-in, slide-in-right, slide-in-left, scale-in, shimmer, pulse, float

- Added shadcn/ui components:
  - alert.tsx, dialog.tsx, input.tsx, label.tsx, select.tsx, tabs.tsx

- Updated dependencies:
  - Added class-variance-authority ^0.7.0
  - Added react-markdown ^9.0.1

- Updated frontend documentation:
  - Comprehensive README.md with feature list, tech stack, project structure
  - Quick start guide and deployment instructions

Technical Details:
- Tailwind v4 uses @import "tailwindcss" instead of @tailwind directives
- All theme customization now in @theme block with CSS variables
- Color system migrated to oklch for better perceptual uniformity
- Animation definitions moved from config to CSS @layer utilities

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-13 08:55:01 +08:00

.claude

update FRONTEND documentation

2025-11-12 23:55:21 +08:00

backend

first

2025-11-12 22:53:17 +08:00

demo_docs

first

2025-11-12 22:53:17 +08:00

frontend

feat: modernize frontend UI with Tailwind v4 and professional design system

2025-11-13 08:55:01 +08:00

models

first

2025-11-12 22:53:17 +08:00

openspec

first

2025-11-12 22:53:17 +08:00

.env

2nd

2025-11-12 22:54:56 +08:00

.env.example

first

2025-11-12 22:53:17 +08:00

.gitignore

2nd

2025-11-12 22:54:56 +08:00

AGENTS.md

first

2025-11-12 22:53:17 +08:00

API_FIX_SUMMARY.md

fix: resolve 7 frontend-backend API inconsistencies and add comprehensive documentation

2025-11-13 08:54:37 +08:00

API_REFERENCE.md

fix: resolve 7 frontend-backend API inconsistencies and add comprehensive documentation

2025-11-13 08:54:37 +08:00

CLAUDE.md

first

2025-11-12 22:53:17 +08:00

FRONTEND_ANALYSIS.md

update FRONTEND documentation

2025-11-12 23:55:21 +08:00

FRONTEND_API.md

fix: resolve 7 frontend-backend API inconsistencies and add comprehensive documentation

2025-11-13 08:54:37 +08:00

FRONTEND_CODE_EXAMPLES.md

update FRONTEND documentation

2025-11-12 23:55:21 +08:00

FRONTEND_QUICK_REFERENCE.md

update FRONTEND documentation

2025-11-12 23:55:21 +08:00

FRONTEND_QUICK_START.md

fix: resolve 7 frontend-backend API inconsistencies and add comprehensive documentation

2025-11-13 08:54:37 +08:00

FRONTEND_README.md

update FRONTEND documentation

2025-11-12 23:55:21 +08:00

FRONTEND_UPGRADE_SUMMARY.md

fix: resolve 7 frontend-backend API inconsistencies and add comprehensive documentation

2025-11-13 08:54:37 +08:00

README.md

first

2025-11-12 22:53:17 +08:00

requirements.txt

first

2025-11-12 22:53:17 +08:00

setup_conda.sh

first

2025-11-12 22:53:17 +08:00

SETUP.md

first

2025-11-12 22:53:17 +08:00

README.md

Tool_OCR

OCR Batch Processing System with Structure Extraction

A web-based solution to extract text, images, and document structure from multiple files efficiently using PaddleOCR-VL.

Features

🔍 Multi-Language OCR: Support for 109 languages (Chinese, English, Japanese, Korean, etc.)
📄 Document Structure Analysis: Intelligent layout analysis with PP-StructureV3
🖼️ Image Extraction: Preserve document images alongside text content
📑 Batch Processing: Process multiple files concurrently with progress tracking
📤 Multiple Export Formats: TXT, JSON, Excel, Markdown with images, searchable PDF
🔧 Flexible Configuration: Rule-based output formatting
🌐 Translation Ready: Reserved architecture for future translation features

Tech Stack

Backend

Framework: FastAPI 0.115.0
OCR Engine: PaddleOCR 3.0+ with PaddleOCR-VL
Database: MySQL via SQLAlchemy
PDF Generation: Pandoc + WeasyPrint
Image Processing: OpenCV, Pillow, pdf2image

Frontend

Framework: React 18 with Vite
Styling: TailwindCSS + shadcn/ui
HTTP Client: Axios with React Query

Prerequisites

macOS: Apple Silicon (M1/M2/M3) or Intel
Python: 3.10+
Conda: Miniconda or Anaconda (will be installed automatically)
Homebrew: For system dependencies
MySQL: External database server (provided)

Installation

1. Automated Setup (Recommended)

# Clone the repository
cd /Users/egg/Projects/Tool_OCR

# Run automated setup script
chmod +x setup_conda.sh
./setup_conda.sh

# If Conda was just installed, reload your shell
source ~/.zshrc  # or source ~/.bash_profile

# Run the script again to create environment
./setup_conda.sh

2. Install Dependencies

# Activate Conda environment
conda activate tool_ocr

# Install Python dependencies
pip install -r requirements.txt

# Install system dependencies (Pandoc for PDF generation)
brew install pandoc

# Install Chinese fonts for PDF generation (optional)
brew install --cask font-noto-sans-cjk
# Note: macOS built-in fonts work fine, this is optional

3. Download PaddleOCR Models

# Create models directory
mkdir -p models/paddleocr

# Models will be automatically downloaded on first run
# (~900MB total, includes PaddleOCR-VL 0.9B model)

4. Configure Environment

# Copy environment template
cp .env.example .env

# Edit .env with your settings
# Database credentials are pre-configured
nano .env

5. Initialize Database

# Database schema will be created automatically on first run
# Using: mysql.theaken.com:33306/db_A060

Usage

Start Backend Server

# Activate environment
conda activate tool_ocr

# Start FastAPI server
cd backend
python -m app.main

# Server runs at: http://localhost:12010
# API docs: http://localhost:12010/docs

Start Frontend (Coming Soon)

# Install frontend dependencies
cd frontend
npm install

# Start development server
npm run dev

# Frontend runs at: http://localhost:12011

Project Structure

Tool_OCR/
├── backend/
│   ├── app/
│   │   ├── api/v1/          # API endpoints
│   │   ├── core/            # Configuration, database
│   │   ├── models/          # Database models
│   │   ├── services/        # Business logic
│   │   ├── utils/           # Utilities
│   │   └── main.py          # Application entry point
│   └── tests/               # Test suite
├── frontend/
│   └── src/                 # React application
├── uploads/
│   ├── temp/                # Temporary uploads
│   ├── processed/           # Processed files
│   └── images/              # Extracted images
├── storage/
│   ├── markdown/            # Markdown outputs
│   ├── json/                # JSON results
│   └── exports/             # Export files
├── models/
│   └── paddleocr/           # PaddleOCR models
├── config/                  # Configuration files
├── templates/               # PDF templates
├── logs/                    # Application logs
├── requirements.txt         # Python dependencies
├── setup_conda.sh           # Environment setup script
├── .env.example             # Environment template
└── README.md

API Endpoints (Planned)

POST /api/v1/ocr/upload - Upload files for OCR processing
GET /api/v1/ocr/tasks - List all OCR tasks
GET /api/v1/ocr/tasks/{task_id} - Get task details
POST /api/v1/ocr/batch - Create batch processing task
GET /api/v1/export/{task_id} - Export results (TXT/JSON/Excel/MD/PDF)
POST /api/v1/translate/document - Translate document (reserved, returns 501)

Development

Run Tests

cd backend
pytest tests/ -v --cov=app

Code Quality

# Format code
black app/

# Lint code
pylint app/

OpenSpec Workflow

This project follows OpenSpec for specification-driven development:

# View current changes
openspec list

# Validate specifications
openspec validate add-ocr-batch-processing

# View implementation tasks
cat openspec/changes/add-ocr-batch-processing/tasks.md

Roadmap

Phase 0: Environment setup and configuration
Phase 1: Core OCR with structure extraction
Phase 2: Frontend development
Phase 3: Testing & optimization
Phase 4: Deployment
Phase 5: Translation feature (future)

License

[To be determined]

Contributors

Development environment: macOS Apple Silicon
Database: MySQL external server
OCR Engine: PaddleOCR-VL 0.9B with PP-StructureV3

Support

For issues and questions, refer to:

OpenSpec documentation: openspec/AGENTS.md
Task breakdown: openspec/changes/add-ocr-batch-processing/tasks.md
Specifications: openspec/changes/add-ocr-batch-processing/specs/