Files
OCR/demo_docs/office_tests/test_document.html
beabigegg da700721fa first
2025-11-12 22:53:17 +08:00

65 lines
1.8 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Office Document OCR Test</title>
</head>
<body>
<h1>Office Document OCR Test</h1>
<h2>測試文件說明</h2>
<p>這是一個用於測試 Tool_OCR 系統 Office 文件支援功能的測試文件。</p>
<p>本系統現已支援以下 Office 格式:</p>
<ul>
<li>Microsoft Word: DOC, DOCX</li>
<li>Microsoft PowerPoint: PPT, PPTX</li>
</ul>
<h2>處理流程</h2>
<p>Office 文件的處理流程如下:</p>
<ol>
<li>使用 LibreOffice 將 Office 文件轉換為 PDF</li>
<li>將 PDF 轉換為圖片(每頁一張)</li>
<li>使用 PaddleOCR 處理每張圖片</li>
<li>合併所有頁面的 OCR 結果</li>
</ol>
<h2>測試數據表格</h2>
<table border="1" cellpadding="5">
<tr>
<th>格式</th>
<th>副檔名</th>
<th>支援狀態</th>
</tr>
<tr>
<td>Word 新版</td>
<td>.docx</td>
<td>✓ 支援</td>
</tr>
<tr>
<td>Word 舊版</td>
<td>.doc</td>
<td>✓ 支援</td>
</tr>
<tr>
<td>PowerPoint 新版</td>
<td>.pptx</td>
<td>✓ 支援</td>
</tr>
<tr>
<td>PowerPoint 舊版</td>
<td>.ppt</td>
<td>✓ 支援</td>
</tr>
</table>
<h2>中英混合測試</h2>
<p>This is a test for mixed Chinese and English OCR recognition.</p>
<p>測試中英文混合識別能力1234567890</p>
<h2>特殊字符測試</h2>
<p>符號測試:!@#$%^&*()_+-=[]{}|;:',.<>?/</p>
<p>數學符號:± × ÷ √ ∞ ≈ ≠ ≤ ≥</p>
</body>
</html>