Initial commit: HBR 文章爬蟲專案
- Scrapy 爬蟲框架,爬取 HBR 繁體中文文章 - Flask Web 應用程式,提供文章查詢介面 - SQL Server 資料庫整合 - 自動化排程與郵件通知功能 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
15
crawler_config.json
Normal file
15
crawler_config.json
Normal file
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"urls": [
|
||||
"https://www.hbrtaiwan.com/"
|
||||
],
|
||||
"downloadDelay": 2,
|
||||
"maxDepth": 3,
|
||||
"concurrentRequests": 10,
|
||||
"skipPaywalled": false,
|
||||
"followPagination": true,
|
||||
"obeyRobotsTxt": true,
|
||||
"articleListSelector": "article, .article-item, .post-item, .content-item",
|
||||
"titleSelector": "h1, .article-title, .post-title",
|
||||
"authorSelector": ".author, .byline, .writer",
|
||||
"contentSelector": ".article-content, .post-content, .content"
|
||||
}
|
||||
Reference in New Issue
Block a user