Added `tool_ocr_` prefix to all database tables for clear separation
from other systems in the same database.
Changes:
- All tables now use `tool_ocr_` prefix
- Added tool_ocr_sessions table for token management
- Created complete SQL schema file with:
- Full table definitions with comments
- Indexes for performance
- Views for common queries
- Stored procedures for maintenance
- Audit log table (optional)
New files:
- database_schema.sql: Ready-to-use SQL script for deployment
Configuration:
- Added DATABASE_TABLE_PREFIX environment variable
- Updated all references to use prefixed table names
Benefits:
- Clear namespace separation in shared databases
- Easier identification of Tool_OCR tables
- Prevent conflicts with other applications
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major updates based on feedback:
1. Remove Azure AD ID storage - use email as primary identifier
2. Complete database redesign - no backward compatibility needed
3. Add comprehensive user task isolation and history features
Database changes:
- Simplified users table (email-based)
- New ocr_tasks table with user association
- New task_files table for file tracking
- Proper indexes for performance
New features:
- User task isolation (A cannot see B's tasks)
- Task history with status tracking (pending/processing/completed/failed)
- Historical query capabilities with filters
- Download support for completed tasks
- Task management UI with search and filters
Security enhancements:
- User context validation in all endpoints
- File access control based on ownership
- Row-level security in database queries
- API-level authorization checks
Implementation approach:
- Clean migration without rollback concerns
- Drop old tables and start fresh
- Simplified deployment process
- Comprehensive task management system
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Create OpenSpec proposal for migrating from local database authentication
to external API authentication using Microsoft Azure AD.
Changes proposed:
- Replace local username/password auth with external API
- Integrate with https://pj-auth-api.vercel.app/api/auth/login
- Use Azure AD tokens instead of local JWT
- Display user 'name' from API response in UI
- Maintain backward compatibility with feature flag
Benefits:
- Single Sign-On (SSO) capability
- Leverage enterprise identity management
- Reduce local user management overhead
- Consistent authentication across applications
Database changes:
- Add external_user_id for Azure AD user mapping
- Add display_name for UI display
- Keep existing schema for rollback capability
Implementation includes:
- Detailed migration plan with phased rollout
- Comprehensive task list for implementation
- Test script for API validation
- Risk assessment and mitigation strategies
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
PaddleOCR-VL chart recognition model requires `fused_rms_norm_ext` API
which is not available in PaddlePaddle 3.0.0 stable release.
Changes:
- Set use_chart_recognition=False in PP-StructureV3 initialization
- Remove unsupported show_log parameter from PaddleOCR 3.x API calls
- Document known limitation in openspec proposal
- Add limitation documentation to README
- Update tasks.md with documentation task for known issues
Impact:
- Layout analysis still detects/extracts charts as images ✓
- Tables, formulas, and text recognition work normally ✓
- Deep chart understanding (type detection, data extraction) disabled ✗
- Chart to structured data conversion disabled ✗
Workaround: Charts saved as image files for manual review
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>