Files

DonaldFang 方士碩 f524713cb6 Initial commit: HBR 文章爬蟲專案

- Scrapy 爬蟲框架，爬取 HBR 繁體中文文章
- Flask Web 應用程式，提供文章查詢介面
- SQL Server 資料庫整合
- 自動化排程與郵件通知功能

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-03 17:19:56 +08:00

3.0 KiB

Raw Blame History

HBR 爬蟲系統 - 啟動說明

主要啟動檔案

🚀 `run_crawler.py` - 主啟動腳本（推薦使用）

這是整合所有功能的主啟動腳本，會依序執行：

執行 Scrapy 爬蟲
檢查 CSV 檔案是否產生
發送郵件（如果已設定 Gmail）

使用方式：

python run_crawler.py

適用場景：

手動執行爬蟲
排程任務（Crontab）
自動化流程

其他 Python 檔案說明

📧 `send_mail.py` - 郵件發送腳本

僅負責發送郵件，不執行爬蟲。

使用方式：

python send_mail.py [csv檔案路徑]

功能：

讀取 CSV 檔案
透過 Gmail SMTP 發送郵件（如果已設定）
如果未設定 Gmail，會跳過郵件發送並顯示提示

🧪 `test_db_connection.py` - 資料庫連線測試

測試資料庫連線並建立資料表結構。

使用方式：

python test_db_connection.py

功能：

測試資料庫連線
建立 HBR_scraper 資料庫（如果不存在）
建立資料表結構
驗證資料表是否建立成功

建議：在首次使用前執行一次

🕷️ `hbr_crawler/` - Scrapy 爬蟲專案

這是 Scrapy 爬蟲的核心程式碼，包含：

spiders/hbr.py - 爬蟲主程式
pipelines.py - 資料處理管道（CSV 匯出、資料庫儲存）
items.py - 資料結構定義
settings.py - 爬蟲設定
database.py - 資料庫連線模組

直接使用 Scrapy 命令：

cd hbr_crawler
scrapy crawl hbr

注意：建議使用 run_crawler.py 而不是直接執行 Scrapy 命令，因為它會整合所有功能。

快速開始

1. 首次設定

# 安裝依賴
pip install -r requirements.txt

# 測試資料庫連線（建立資料庫和資料表）
python test_db_connection.py

2. 執行爬蟲

# 方式一：使用主啟動腳本（推薦）
python run_crawler.py

# 方式二：直接使用 Scrapy 命令
cd hbr_crawler
scrapy crawl hbr

3. 排程設定（Crontab）

# 每天 08:00 執行
0 8 * * * cd /path/to/project && /usr/bin/python3 run_crawler.py >> logs/cron.log 2>&1

檔案功能對照表

檔案	功能	是否可獨立執行	用途
`run_crawler.py`	整合所有功能	✅ 是	主要啟動腳本
`send_mail.py`	發送郵件	✅ 是	郵件發送
`test_db_connection.py`	測試資料庫	✅ 是	資料庫設定
`hbr_crawler/spiders/hbr.py`	爬蟲核心	❌ 需透過 Scrapy	爬蟲邏輯
`hbr_crawler/pipelines.py`	資料處理	❌ 需透過 Scrapy	資料處理
`hbr_crawler/database.py`	資料庫模組	❌ 被其他模組引用	資料庫連線

建議使用流程

首次設定：執行 test_db_connection.py
日常使用：執行 run_crawler.py
僅發送郵件：執行 send_mail.py
排程任務：在 Crontab 中設定 run_crawler.py

3.0 KiB Raw Blame History Unescape Escape

HBR 爬蟲系統 - 啟動說明

主要啟動檔案

🚀 run_crawler.py - 主啟動腳本（推薦使用）

其他 Python 檔案說明

📧 send_mail.py - 郵件發送腳本

🧪 test_db_connection.py - 資料庫連線測試

🕷️ hbr_crawler/ - Scrapy 爬蟲專案

快速開始

1. 首次設定

2. 執行爬蟲

3. 排程設定（Crontab）

檔案功能對照表

建議使用流程

3.0 KiB

Raw Blame History

🚀 `run_crawler.py` - 主啟動腳本（推薦使用）

📧 `send_mail.py` - 郵件發送腳本

🧪 `test_db_connection.py` - 資料庫連線測試

🕷️ `hbr_crawler/` - Scrapy 爬蟲專案