egg/OCR - OCR

egg/OCR

Author	SHA1	Message	Date
egg	bbd68a2162	feat: enable audit logging for authentication and task operations Add audit_service.log_event() calls to track key user activities: - auth_login: successful and failed login attempts with IP/user agent - auth_logout: single session and all sessions logout - task_delete: task deletion with user context - file_upload: file upload with filename, size, and type - admin_cleanup: manual cleanup trigger with statistics Each event captures client IP (from X-Forwarded-For/X-Real-IP headers), user agent, and relevant metadata for compliance and debugging. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-14 12:46:20 +08:00
egg	73112db055	feat: add storage cleanup mechanism with soft delete and auto scheduler - Add soft delete (deleted_at column) to preserve task records for statistics - Implement cleanup service to delete old files while keeping DB records - Add automatic cleanup scheduler (configurable interval, default 24h) - Add admin endpoints: storage stats, cleanup trigger, scheduler status - Update task service with admin views (include deleted/files_deleted) - Add frontend storage management UI in admin dashboard - Add i18n translations for storage management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-14 12:41:01 +08:00
egg	81a0a3ab0f	feat: complete i18n support for all frontend pages and components Add comprehensive bilingual (zh-TW/en-US) support across the entire frontend: Pages updated: - AdminDashboardPage: All 63+ strings translated - TaskHistoryPage: All 80+ strings translated - TaskDetailPage: All 90+ strings translated - AuditLogsPage: All audit log UI translated - ResultsPage/ProcessingPage: Fixed i18n integration - UploadPage: Step indicators and file list UI translated Components updated: - TaskNotFound: Task deletion messages - FileUpload: Prompts and file size limits - ProcessingTrackSelector: Processing mode options with analysis info - Layout: Navigation descriptions - ProtectedRoute: Loading and access denied messages - PDFViewer: Page navigation and error messages Locale files: Added ~200 new translation keys to both zh-TW.json and en-US.json 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-14 11:56:18 +08:00
egg	3876477bda	feat: add multilingual font support for translated PDFs - Add NotoSansKR and NotoSansThai fonts for Korean and Thai language support - Update download_fonts.sh to download all required fonts - Add LANGUAGE_FONT_MAP for language-to-font mapping in pdf_generator_service.py - Add get_font_for_language() method to select appropriate font based on target language - Update _get_reflow_styles() to accept target_lang parameter - Pass target_lang through generate_translated_pdf() to PDF generation methods - Fix garbled characters (亂碼) issue for Korean and Thai translations Supported languages: - Chinese (zh-CN/zh-TW), Japanese (ja): NotoSansSC - Korean (ko): NotoSansKR - Thai (th): NotoSansThai - Russian, Vietnamese, Latin languages: NotoSansSC 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 19:18:58 +08:00
egg	efa7e4175c	feat: optimize task file generation and add visualization download Backend changes: - Disable PP-Structure debug file generation by default - Separate raw_ocr_regions.json generation from debug flag (critical file) - Add visualization folder download endpoint as ZIP - Add has_visualization field to TaskDetailResponse - Stop generating Markdown files - Save translated PDFs to task folder with caching Frontend changes: - Replace JSON/MD download buttons with PDF buttons in TaskHistoryPage - Add visualization download button in TaskDetailPage - Fix Processing page task switching issue (reset isNotFound) Archives two OpenSpec proposals: - optimize-task-files-and-visualization - simplify-frontend-add-billing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 19:11:50 +08:00
egg	65abd51d60	feat: add translation billing stats and remove Export/Settings pages - Add TranslationLog model to track translation API usage per task - Integrate Dify API actual price (total_price) into translation stats - Display translation statistics in admin dashboard with per-task costs - Remove unused Export and Settings pages to simplify frontend - Add GET /api/v2/admin/translation-stats endpoint 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 17:38:12 +08:00
egg	d20751d56b	feat: add batch processing for multiple file uploads - Add BatchState management in taskStore with progress tracking - Implement batch processing service with concurrency control - Direct Track: max 5 parallel tasks - OCR Track: sequential processing (GPU VRAM limit) - Refactor ProcessingPage to support batch mode with BatchProcessingPanel - Update UploadPage to initialize batch state for multi-file uploads - Add i18n translations for batch processing (zh-TW, en-US) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 17:05:16 +08:00
egg	d5bc311757	feat: simplify login page UX and add i18n English support - Redesign LoginPage with minimal professional style - Remove animated gradient backgrounds and floating orbs - Remove marketing claims (99% accuracy, enterprise-grade) - Center login form with clean card design - Add multi-language support (zh-TW, en-US) - Create LanguageSwitcher component in sidebar - Add en-US.json translation file - Persist language preference in localStorage - Remove unused top header bar with search - Move language switcher to sidebar user section 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 12:49:48 +08:00
egg	1f18010040	fix: OCR Track reflow PDF and translation with image text filtering - Add OCR Track support for reflow PDF generation using raw_ocr_regions.json - Add OCR Track translation extraction from raw_ocr_regions instead of elements - Add raw_ocr_translations output format for OCR Track documents - Add exclusion zone filtering to remove text overlapping with images - Update API validation to accept both translations and raw_ocr_translations - Add page_number field to TranslatedItem for proper tracking 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 11:02:35 +08:00
egg	24253ac15e	feat: unify Direct Track PDF rendering and simplify export options Backend changes: - Apply background image + invisible text layer to all Direct Track PDFs - Add CHART to regions_to_avoid for text extraction - Improve visual fidelity for native PDFs and Office documents Frontend changes: - Remove JSON, UnifiedDocument, Markdown download buttons - Simplify to 2-column layout with only Layout PDF and Reflow PDF - Remove translation JSON download and Layout PDF option - Keep only Reflow PDF for translated document downloads - Clean up unused imports (FileJson, Database, FileOutput) Archives two OpenSpec proposals: - unify-direct-track-pdf-rendering - simplify-frontend-export-options 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 07:50:43 +08:00
egg	53bfa88773	docs: archive simplify-frontend-ocr-config proposal 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 17:17:07 +08:00
egg	63ffa8f0e3	docs: archive enable-doc-orientation-detection proposal Feature implementation completed and tested successfully. - PP-StructureV3 orientation detection enabled - Page dimensions correctly swapped for 90°/270° rotations - Output PDF now displays landscape content correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 17:15:05 +08:00
egg	cfe65158a3	feat: enable document orientation detection for scanned PDFs - Enable PP-StructureV3's use_doc_orientation_classify feature - Detect rotation angle from doc_preprocessor_res.angle - Swap page dimensions (width <-> height) for 90°/270° rotations - Output PDF now correctly displays landscape-scanned content Also includes: - Archive completed openspec proposals - Add simplify-frontend-ocr-config proposal (pending) - Code cleanup and frontend simplification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 17:13:46 +08:00
egg	57070af307	docs: mark remove-unused-code tasks as completed All cleanup tasks have been completed: - Backend: 3 service files removed (~1,200 lines) - Frontend: 2 components + 2 API files removed (~758 lines) - Total: 1,958 lines of redundant code removed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 12:03:41 +08:00
egg	5d962ca97c	refactor: remove unused code and migrate legacy API Backend cleanup: - Remove ocr_service_original.py (legacy OCR service, replaced by ocr_service.py) - Remove preprocessor.py (unused, functionality absorbed by layout_preprocessing_service.py) - Remove pdf_font_manager.py (unused, never referenced by any service) Frontend cleanup: - Remove MarkdownPreview.tsx (unused component) - Remove ResultsTable.tsx (unused, replaced by TaskHistoryPage) - Remove services/api.ts (legacy API client, migrated to apiV2) - Remove types/api.ts (legacy types, migrated to apiV2.ts) API migration: - Add export rules CRUD methods to apiClientV2 - Update SettingsPage.tsx to use apiClientV2 - Update Layout.tsx to use only apiClientV2 for logout This reduces ~1,500 lines of redundant code and unifies the API client. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 12:03:09 +08:00
egg	940a406dce	chore: backup before code cleanup Backup commit before executing remove-unused-code proposal. This includes all pending changes and new features. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 11:55:39 +08:00
egg	eff9b0bcd5	feat: refactor dual-track architecture (Phase 1-5) ## Backend Changes - Service Layer Refactoring: - Add ProcessingOrchestrator for unified document processing - Add PDFTableRenderer for table rendering extraction - Add PDFFontManager for font management with CJK support - Add MemoryPolicyEngine (73% code reduction from MemoryGuard) - Bug Fixes: - Fix Direct Track table row span calculation - Fix OCR Track image path handling - Add cell_boxes coordinate validation - Filter out small decorative images - Add covering image detection ## Frontend Changes - State Management: - Add TaskStore for centralized task state management - Add localStorage persistence for recent tasks - Add processing state tracking - Type Consolidation: - Merge shared types from api.ts to apiV2.ts - Update imports in authStore, uploadStore, ResultsTable, SettingsPage - Page Integration: - Integrate TaskStore in ProcessingPage and TaskDetailPage - Update useTaskValidation hook with cache sync ## Testing - Direct Track: edit.pdf (3 pages, 1.281s), edit3.pdf (2 pages, 0.203s) - Cell boxes validation: 43 valid, 0 invalid - Table merging: 12 merged cells verified 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-07 07:18:27 +08:00
egg	8265be1741	test	2025-12-04 18:00:37 +08:00
egg	9437387ef1	fix: add IoU text coverage check and page boundary validation Vector rectangles: - Add page boundary check (skip out-of-bounds rectangles) - Clip rectangles to page boundaries Covering images: - Add page boundary check (skip out-of-bounds images) - Add IoU-based text coverage verification - Only report images that actually cover text (>= 50% word coverage) - Add covered_text_count to detection results This reduces false positives from black logos or decorative images that don't actually cover any text content. Test results (edit3.pdf): - Before: 10 covering images detected - After: 6 covering images detected (4 filtered - no text coverage) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-04 07:48:38 +08:00
egg	1c3c37bce0	test: add covering images to preprocessing test output Updates test script to display covering images count in quality report. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-04 07:43:19 +08:00
egg	d6387adbd1	feat: add black/white covering image detection Implements detection of embedded images used for redaction/covering: - Analyzes embedded images for mostly black (avg RGB <= 30) or white (>= 245) - Uses PIL to efficiently sample image colors - Gets image position on page via get_image_rects() - Integrates with existing preprocessing pipeline - Adds covering_images to page metadata and quality report Detection results: - demo_docs/edit3.pdf: 10 black covering images detected (7 on P1, 3 on P2) Quality report now includes: - total_covering_images count - Per-page covering_images details with bbox, color_type, size 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-04 07:42:55 +08:00
egg	3903bcf77d	fix: tighten covering detection thresholds to avoid false positives - Increase white threshold from 0.95 to 0.98 (pure white only) - Decrease black threshold from 0.05 to 0.02 (pure black only) - Remove "other solid" detection (caused false positives on gray backgrounds) This prevents light gray table cell backgrounds (RGB ~0.93) from being incorrectly detected as covering/redaction rectangles. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-04 07:36:07 +08:00
egg	bc66f72352	feat: extend covering detection to include black/redaction rectangles Expands whiteout detection to handle: - White rectangles (RGB >= 0.95) - correction tape / white-out - Black rectangles (RGB <= 0.05) - redaction / censoring - Other solid fills (very dark or very light) - potential covering Adds color_type to covered text results for better logging. Logs now show breakdown by cover type (white, black, other). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-04 07:34:35 +08:00
egg	63b474f93a	test: add preprocessing pipeline test script Adds test script for validating PDF preprocessing pipeline: - Garble rate detection unit tests - Page number pattern detection unit tests - Integration tests with demo_docs/edit*.pdf files - Quality report generation verification Usage: PYTHONPATH=backend python3 scripts/run_preprocessing_tests.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 16:51:12 +08:00
egg	6a65c7617d	feat: add PDF preprocessing pipeline for Direct track Implement multi-stage preprocessing pipeline to improve extraction quality: Phase 1 - Object-level Cleaning: - Content stream sanitization via clean_contents(sanitize=True) - Hidden OCG layer detection - White-out detection with IoU 80% threshold Phase 2 - Layout Analysis: - Column-aware sorting (sort=True) - Page number pattern detection and filtering - Position-based element classification Phase 3 - Enhanced Extraction: - Garble rate detection (cid:xxxx, U+FFFD, PUA characters) - OCR fallback recommendation when garble >10% - Quality report generation interface Phase 4 - GS Distillation (Exception Handler): - Ghostscript PDF repair for severely damaged files - Auto-triggered on high garble or mupdf errors - Graceful fallback when GS unavailable 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 16:11:00 +08:00
egg	1b5c7f39a8	fix: improve PDF layout generation for Direct track Key fixes: - Skip large vector_graphics charts (>50% page coverage) that cover text - Fix font fallback to use NotoSansSC for CJK support instead of Helvetica - Improve translated table rendering with dynamic font sizing - Add merged cell (row_span/col_span) support for reflow tables - Skip text elements inside table bboxes to avoid duplication Archive openspec proposal: fix-pdf-table-rendering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 14:55:00 +08:00
egg	08adf3d01d	feat: add translated PDF format selection (layout/reflow) - Add generate_translated_layout_pdf() method for layout-preserving translated PDFs - Add generate_translated_pdf() method for reflow translated PDFs - Update translate router to accept format parameter (layout/reflow) - Update frontend with dropdown to select translated PDF format - Fix reflow PDF table cell extraction from content dict - Add embedded images handling in reflow PDF tables - Archive improve-translated-text-fitting openspec proposal 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-03 10:10:28 +08:00
egg	0dcea4a7e7	fix: use task.files relationship to get source file path Task model doesn't have file_path attribute directly. Use the files relationship to access TaskFile.stored_path for source file path. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 18:12:22 +08:00
egg	bed473cd30	fix: properly stop child processes and orphaned services - Kill entire process tree (parent + children) when stopping - Add port-based cleanup as fallback for orphaned processes - Remove 'set -e' to allow graceful failure handling - Pass port numbers to stop_service for cleanup 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 18:01:24 +08:00
egg	7916c75768	fix: allow extra environment variables in pydantic-settings Add extra='ignore' to Settings Config to prevent ValidationError when .env files contain deprecated variables (e.g., PADDLEOCR_MODEL_DIR). This ensures backwards compatibility with Docker deployments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 17:53:34 +08:00
egg	c006905b6f	refactor: centralize DIFY settings in config.py and cleanup env files - Update config.py to read both .env and .env.local (with .env.local priority) - Move DIFY API settings from hardcoded values to environment configuration - Remove unused PADDLEOCR_MODEL_DIR setting (models stored in ~/.paddleocr/) - Remove deprecated argostranslate translation settings - Add DIFY settings: base_url, api_key, timeout, max_retries, batch limits - Update dify_client.py to use settings from config.py - Update translation_service.py to use settings instead of constants - Fix frontend env files to use correct variable name VITE_API_BASE_URL - Update setup_dev_env.sh with correct PaddlePaddle version (3.2.0) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 17:50:47 +08:00
egg	d7f7166a2d	feat: unify environment scripts with start.sh - Add unified start.sh script with subcommands (all/backend/frontend) - Add process management (--stop, --status) - Remove separate start_backend.sh and start_frontend.sh - Update setup_dev_env.sh with pre-flight checks and --cpu-only/--skip-db options - Update .env.example to remove sensitive data and add DIFY translation config - Add .pid/ to .gitignore for process management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 12:48:52 +08:00
egg	a07aad96b3	feat: add translated PDF export with layout preservation Adds the ability to download translated documents as PDF files while preserving the original document layout. Key changes: - Add apply_translations() function to merge translation JSON with UnifiedDocument - Add generate_translated_pdf() method to PDFGeneratorService - Add POST /api/v2/translate/{task_id}/pdf endpoint - Add downloadTranslatedPdf() method and PDF button in frontend - Add comprehensive unit tests (52 tests: merge, PDF generation, API endpoints) - Archive add-translated-pdf-export proposal 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 12:33:31 +08:00
egg	8d9b69ba93	feat: add document translation via DIFY AI API Implement document translation feature using DIFY AI API with batch processing: Backend: - Add DIFY client with batch translation support (5000 chars, 20 items per batch) - Add translation service with element extraction and result building - Add translation router with start/status/result/list/delete endpoints - Add translation schemas (TranslationRequest, TranslationStatus, etc.) Frontend: - Enable translation UI in TaskDetailPage - Add translation API methods to apiV2.ts - Add translation types Features: - Batch translation with numbered markers [1], [2], [3]... - Support for text, title, header, footer, paragraph, footnote, table cells - Translation result JSON with statistics (tokens, latency, batch_count) - Background task processing with progress tracking 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 11:57:02 +08:00
egg	87dc97d951	fix: improve Office document processing with Direct track - Force Office documents (PPTX, DOCX, XLSX) to use Direct track after LibreOffice conversion, since converted PDFs always have extractable text - Fix PDF generator to not exclude text in image regions for Direct track, allowing text to render on top of background images (critical for PPT) - Increase file_type column from VARCHAR(50) to VARCHAR(100) to support long MIME types like PPTX - Remove reference to non-existent total_images metadata attribute This significantly improves processing time for Office documents (from ~170s OCR to ~10s Direct) while preserving text quality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-30 16:22:04 +08:00
egg	6806fff1d5	chore: archive extract-table-cell-boxes proposal Archived the extract-table-cell-boxes proposal which implemented: - Table cell boxes extraction from PP-StructureV3 table_res_list - Layered rendering for tables with cell borders - CV-based table line detection (disabled) - Scan artifact removal preprocessing - PDF orientation detection for rotated documents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-30 14:22:29 +08:00
egg	6252be6c6f	fix: correct scale factor calculation for rotated documents When rotation is detected, the OCR coordinate system needs to be swapped: - Original OCR dimensions: 1242 x 1755 (portrait image) - Content coordinates: up to x=1593 (exceeds image width, indicates rotation) - Rotated OCR dimensions: 1755 x 1242 (matching content coordinate system) Previously, page_dimensions was incorrectly set to target PDF dimensions, causing scale factors to be ~1.0 instead of ~0.48. Now correctly: - original_page_sizes[0] = target PDF dimensions (842.4 x 595.68) - page_dimensions[0] = swapped OCR dimensions (1755 x 1242) - Scale = 842.4/1755 ≈ 0.48 for both x and y 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-30 13:42:48 +08:00
egg	f27b4d9710	fix: correct orientation detection to use OCR pixel coordinates Fixed two issues in PDF orientation detection: 1. Unit mismatch: Orientation detection was comparing content bboxes (in pixels) against PDF page dimensions (in points). Now correctly uses OCR dimensions (pixels) for detection. 2. Priority override: When orientation adjustment is needed, now also updates original_page_sizes dict so per-page processing uses the adjusted dimensions instead of the original PDF dimensions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-30 13:37:03 +08:00
egg	c65e4f98d4	fix: detect and handle rotated document content in PDF generation Add orientation detection to handle cases where scanned documents have content in a different orientation than the image dimensions suggest. When PP-StructureV3 processes rotated documents, it may return bounding boxes in the "corrected" orientation while the image remains in its scanned orientation. This causes content to extend beyond page boundaries. The fix: - Add _detect_content_orientation() method to detect when content bbox exceeds page dimensions significantly - Automatically swap page dimensions when landscape content is detected in portrait-oriented images (and vice versa) - Apply orientation detection for both single-page and multi-page documents Fixes issue where horizontal delivery slips scanned vertically were generating PDFs with content cut off or incorrectly positioned. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-30 13:27:01 +08:00
egg	95ae1f1bdb	feat: add table detection options and scan artifact removal - Add TableDetectionSelector component for wired/wireless/region detection - Add CV-based table line detector module (disabled due to poor performance) - Add scan artifact removal preprocessing step (removes faint horizontal lines) - Add PreprocessingConfig schema with remove_scan_artifacts option - Update frontend PreprocessingSettings with scan artifact toggle - Integrate table detection config into ProcessingPage - Archive extract-table-cell-boxes proposal 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-30 13:21:50 +08:00
egg	f5a2c8a750	feat: extract cell_box_list from table_res_list Based on pp_demo analysis, PPStructureV3 returns table_res_list containing cell_box_list which was previously ignored. This commit: - Extract table_res_list from PPStructureV3 result alongside parsing_res_list - Add table_res_list parameter to _process_parsing_res_list() - Prioritize cell_box_list from table_res_list over SLANeXt extraction - Match tables by HTML content or use first available Priority order for cell boxes: 1. table_res_list.cell_box_list (native, already absolute coords) 2. res_data['boxes'] (unlikely in PaddleX 3.x) 3. Direct SLANeXt model call (fallback) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 12:41:18 +08:00
egg	5ddccbf5a2	docs: update tasks.md with Phase 1-3 completion status Mark completed tasks in extract-table-cell-boxes proposal: - Phase 1: Config and model caching ✓ - Phase 2: Cell boxes extraction ✓ - Phase 3: PDF generation optimization ✓ Remaining: Phase 4 (testing) and Phase 5 (cleanup) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 12:20:59 +08:00
egg	715805b3b8	feat: implement table cell boxes extraction with SLANeXt Phase 1-3 implementation of extract-table-cell-boxes proposal: - Add enable_table_cell_boxes_extraction config option - Implement lazy-loaded SLANeXt model caching in PPStructureEnhanced - Add _extract_cell_boxes_with_slanet() method for direct model invocation - Supplement PPStructureV3 table processing with SLANeXt cell boxes - Add _compute_table_grid_from_cell_boxes() for column width calculation - Modify draw_table_region() to use cell_boxes for accurate layout Key features: - Auto-detect table type (wired/wireless) using PP-LCNet classifier - Convert 8-point polygon bbox to 4-point rectangle - Graceful fallback to equal distribution when cell_boxes unavailable - Proper coordinate transformation with scaling support 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 12:20:32 +08:00
egg	801ee9c4b6	feat: create extract-table-cell-boxes proposal and archive old proposal - Archive unify-image-scaling proposal to archive/2025-11-28 - Create new extract-table-cell-boxes proposal for supplementing PPStructureV3 with direct SLANeXt model calls to extract table cell bounding boxes - Add debug logging to pp_structure_enhanced.py for table cell boxes investigation - Discovered that PPStructureV3 high-level API filters out cell bbox data, but paddlex.create_model() can directly invoke underlying models 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 12:15:06 +08:00
egg	dda9621e17	feat: enhance layout preprocessing and unify image scaling proposal Backend changes: - Add image scaling configuration for PP-Structure processing - Enhance layout preprocessing service with scaling support - Update OCR service with improved memory management - Add PP-Structure enhanced processing improvements Frontend changes: - Update preprocessing settings UI - Fix processing page layout and state management - Update API types for new parameters Proposals: - Archive add-layout-preprocessing proposal (completed) - Add unify-image-scaling proposal for consistent coordinate handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 09:23:19 +08:00
egg	86bbea6fbf	fix: improve OCR track table rendering with Paragraph wrapping Changes: - Remove PDF caching to ensure code changes take effect - Add PDF rotation handling (90/270 degree swap) - Add dict bbox format support for UnifiedDocument - Use Paragraph objects for table cells to enable text auto-wrapping - Align OCR track table rendering logic with Direct track (no fixed rowHeights) Known issue: PP-StructureV3 does not provide cell bbox in output (block_content only contains HTML string, no res['boxes'] like old PPStructure) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 09:22:07 +08:00
egg	2861f54838	fix: prevent preview infinite loop and add document type filtering - Remove onAutoConfigReceived callback that caused state update loop - Add document analysis to check if file needs OCR track - Only show preprocessing options for OCR-eligible files (images, scanned PDFs) - Show informative message for editable PDFs that use direct text extraction - Display text coverage percentage for editable documents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 17:31:05 +08:00
egg	894d18b432	feat: add real-time preprocessing preview with side-by-side comparison - Create PreprocessingPreview component with debounced config updates - Show original vs preprocessed images side-by-side - Display image quality metrics (contrast, sharpness) with quality indicators - Add zoom controls and fullscreen view for detailed inspection - Show auto-detected configuration when in auto mode - Integrate preview toggle with PreprocessingSettings component - Add i18n translations for preview panel UI 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 17:25:52 +08:00
egg	5982fff71c	feat: add contrast/sharpen strength controls, disable binarization Major improvements to preprocessing controls: Backend: - Add contrast_strength (0.5-3.0) and sharpen_strength (0.5-2.0) to PreprocessingConfig - Auto-detection now calculates optimal strength based on image quality metrics: - Lower contrast → Higher contrast_strength - Lower edge strength → Higher sharpen_strength - Disable binarization in auto mode (rarely beneficial) - CLAHE clipLimit now scales with contrast_strength - Sharpening uses unsharp mask with variable strength Frontend: - Add strength sliders for contrast and sharpen in manual mode - Sliders show current value and strength level (輕微/正常/強/最強) - Move binarize option to collapsible "進階選項" section - Updated i18n translations for strength labels 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 17:18:44 +08:00
egg	f6d2957592	fix: pass preprocessing parameters from start_task to OCR service The preprocessing_mode and preprocessing_config parameters were not being passed from the start_task endpoint through to the OCR service: - Add preprocessing_mode and preprocessing_config to process_task_ocr() - Extract preprocessing options from ProcessingOptions in start_task() - Convert string/dict to proper PreprocessingModeEnum/PreprocessingConfig - Pass converted parameters to ocr_service.process() and process_image() 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 16:13:32 +08:00

1 2 3 4

156 Commits