Skip to content

Commit acf2cb9

Browse files
committed
Refactoring: Add a controller
1 parent 2fc2afb commit acf2cb9

File tree

12 files changed

+232
-209
lines changed

12 files changed

+232
-209
lines changed

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,11 @@ x-crawl is a flexible nodejs crawler library. It can crawl pages in batches, net
88
99
## Features
1010

11-
- **🔥 Asynchronous/Synchronous** - Support asynchronous/synchronous mode batch crawling.
11+
- **🔥 Async/Sync** - Just change the mode property to toggle async/sync crawling mode.
1212
- **⚙️ Multiple functions** - Batch crawling of pages, batch network requests, batch download of file resources, polling crawling, etc.
1313
- **🖋️ Flexible writing style** - Multiple crawling configurations and ways to get crawling results.
1414
- **⏱️ Interval crawling** - no interval/fixed interval/random interval, you can use/avoid high concurrent crawling.
15+
- **🚀 Crawl Repost** - Under development.
1516
- **☁️ Crawl SPA** - Batch crawl SPA (Single Page Application) to generate pre-rendered content (ie "SSR" (Server Side Rendering)).
1617
- **⚒️ Controlling Pages** - Headless browsers can submit forms, keystrokes, event actions, generate screenshots of pages, etc.
1718
- **🧾 Capture Record** - Capture and record the crawled results, and highlight the reminders.
@@ -130,6 +131,7 @@ running result:
130131
<div align="center">
131132
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/en/crawler-result.png" />
132133
</div>
134+
133135
**Note:** Do not crawl at will, you can check the **robots.txt** protocol before crawling. This is just to demonstrate how to use x-crawl.
134136

135137
## Core concepts
@@ -857,4 +859,4 @@ interface FileInfo {
857859

858860
## More
859861

860-
If you have any **questions** or **needs** , please submit **Issues in** https://github.com/coder-hxl/x-crawl/issues .
862+
If you have **problems, needs, good suggestions** please raise **Issues** in https://github.com/coder-hxl/x-crawl/issues.

docs/cn.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,12 @@ x-crawl 是一个灵活的 nodejs 爬虫库。可批量爬取页面、批量网
88
99
## 特征
1010

11-
- **🔥 异步/同步** - 支持 异步/同步 模式批量爬取
11+
- **🔥 异步/同步** - 只需更改一下 mode 属性即可切换 异步/同步 爬取模式
1212
- **⚙️ 多种功能** - 可批量爬取页面、批量网络请求、批量下载文件资源、轮询爬取等。
1313
- **🖋️ 写法灵活** - 多种爬取配置、获取爬取结果的写法。
1414
- **⏱️ 间隔爬取** - 无间隔/固定间隔/随机间隔,可以 使用/避免 高并发爬取。
15+
- **🚀 爬取重发** - 开发中。
16+
1517
- **☁️ 爬取 SPA** - 批量爬取 SPA(单页应用程序)生成预渲染内容(即“SSR”(服务器端渲染))。
1618
- **⚒️ 控制页面** - 无头浏览器可以表单提交、键盘输入、事件操作、生成页面的屏幕截图等。
1719
- **🧾 捕获记录** - 对爬取的结果进行捕获记录,并进行高亮的提醒。
@@ -850,4 +852,4 @@ interface FileInfo {
850852

851853
## 更多
852854

853-
如有 **问题****需求** 请在 https://github.com/coder-hxl/x-crawl/issues 中提 **Issues** 。
855+
如果您有 **问题 、需求、好的建议** 请在 https://github.com/coder-hxl/x-crawl/issues 中提 **Issues** 。

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
"dependencies": {
1919
"chalk": "4.1.2",
2020
"https-proxy-agent": "^5.0.1",
21-
"puppeteer": "^19.7.2",
21+
"puppeteer": "19.8.0",
2222
"x-crawl": "link:"
2323
},
2424
"devDependencies": {

0 commit comments

Comments
 (0)