test: add unit tests for DocumentTypeDetector

- Create test directory structure for backend
- Add pytest fixtures for test files (PDF, images, Office docs)
- Add 20 unit tests covering:
  - PDF type detection (editable, scanned, mixed)
  - Image file detection (PNG, JPG)
  - Office document detection (DOCX)
  - Text file detection
  - Edge cases (file not found, unknown types)
  - Batch processing and statistics
- Mark tasks 1.1.4 and 1.3.5 as completed in tasks.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
egg
2025-11-19 12:14:59 +08:00
parent 1d0b63854a
commit 0fcb2492c9
6 changed files with 486 additions and 2 deletions

View File

@@ -5,7 +5,7 @@
- [x] 1.1.1 Add PyMuPDF>=1.23.0
- [x] 1.1.2 Add pdfplumber>=0.10.0
- [x] 1.1.3 Add python-magic-bin>=0.4.14
- [ ] 1.1.4 Test dependency installation
- [x] 1.1.4 Test dependency installation
- [x] 1.2 Create UnifiedDocument model in backend/app/models/
- [x] 1.2.1 Define UnifiedDocument dataclass
- [x] 1.2.2 Add DocumentElement model
@@ -17,7 +17,7 @@
- [x] 1.3.2 Add PDF editability checking logic
- [x] 1.3.3 Add Office document detection
- [x] 1.3.4 Create routing logic to determine processing track
- [ ] 1.3.5 Add unit tests for detector
- [x] 1.3.5 Add unit tests for detector
## 2. Direct Extraction Track
- [x] 2.1 Create DirectExtractionEngine service