Skip to content

Commit fbc1b12

Browse files
committed
feat: version update
1 parent 1c97855 commit fbc1b12

File tree

4 files changed

+65
-7
lines changed

4 files changed

+65
-7
lines changed

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
1+
# [v8.3.0](https://github.com/coder-hxl/x-crawl/compare/v8.2.0..v8.3.0) (2023-11-09)
2+
3+
### 🚀 Features
4+
5+
- Added log option to control printing information in the terminal.
6+
- The terminal printing information has been upgraded to make it easier to distinguish the source of the information.
7+
8+
---
9+
10+
### 🚀 特征
11+
12+
- 新增 log 选项,用于控制在终端的打印信息。
13+
- 终端打印信息升级,更容易区分信息来源。
14+
115
# [v8.2.0](https://github.com/coder-hxl/x-crawl/compare/v8.1.1...v8.2.0) (2023-09-07)
216

317
### 🚀 Features

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"private": true,
33
"name": "x-crawl",
4-
"version": "8.2.0",
4+
"version": "8.3.0",
55
"author": "coderHXL",
66
"description": "x-crawl is a flexible Node.js multifunctional crawler library.",
77
"license": "MIT",

publish/README.md

Lines changed: 49 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ x-crawl is an open source project under the MIT license, completely free to use.
5757
- [Rotate Proxy](#rotate-proxy)
5858
- [Custom Device Fingerprint](#custom-device-fingerprint)
5959
- [Priority Queue](#priority-queue)
60+
- [Print information](#print-information)
6061
- [About Results](#about-results)
6162
- [TypeScript](#typescript)
6263
- [API](#api)
@@ -134,6 +135,7 @@ x-crawl is an open source project under the MIT license, completely free to use.
134135
- [Community](#community)
135136
- [Issues](#issues)
136137
- [Sponsor](#sponsor-1)
138+
- [Special Instructions](#special-instructions)
137139

138140
## Install
139141

@@ -197,7 +199,7 @@ myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
197199
running result:
198200

199201
<div align="center">
200-
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/run-example-gif.gif" />
202+
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/run-example.gif" />
201203
</div>
202204

203205
**Note:** Please do not crawl randomly, you can check the **robots.txt** protocol before crawling. The class name of the website may change, this is just to demonstrate how to use x-crawl.
@@ -787,6 +789,34 @@ myXCrawl
787789
788790
The larger the value of the priority attribute, the higher the priority in the current crawling queue.
789791
792+
### Print information
793+
794+
The crawled print information consists of start (displaying mode and total number), process (displaying number and how long to wait), and result (displaying success and failure information). There will be something like **1-page-2** in front of each piece of information. The first 1 represents the first crawler instance, the middle page represents the API type, and the following 2 represents the second page of the first crawler instance. Do this The purpose is to better distinguish which API the information comes from.
795+
796+
When you do not want to display the crawled information in the terminal, you can control the display or hiding through the options.
797+
798+
```js
799+
import xCrawl from 'x-crawl'
800+
801+
// Only hide the process, start and result display
802+
const myXCrawl = xCrawl({ log: { process: false } })
803+
804+
// Hide all information
805+
const myXCrawl = xCrawl({ log: false })
806+
```
807+
808+
The log option accepts an object or boolean type:
809+
810+
- Boolean
811+
812+
- true: show all
813+
- false: hide all
814+
815+
- object
816+
- start: control the start information
817+
- process: control of process information
818+
- result: control of result information
819+
790820
### About Results
791821
792822
Each crawl target will generate a detail object, which will contain the following properties:
@@ -938,7 +968,7 @@ const myXCrawl = xCrawl()
938968
myXCrawl
939969
.crawlPage({
940970
url: 'https://www.example.com',
941-
proxy: 'xxx',
971+
proxy: { urls: ['xxx'] },
942972
maxRetry: 1
943973
})
944974
.then((res) => {})
@@ -1089,7 +1119,7 @@ const myXCrawl = xCrawl()
10891119
myXCrawl
10901120
.crawlHTML({
10911121
url: 'https://www.example.com',
1092-
proxy: 'xxx',
1122+
proxy: { urls: ['xxx'] },
10931123
maxRetry: 1
10941124
})
10951125
.then((res) => {})
@@ -1246,7 +1276,7 @@ const myXCrawl = xCrawl()
12461276
myXCrawl
12471277
.crawlData({
12481278
url: 'https://www.example.com/api',
1249-
proxy: 'xxx',
1279+
proxy: { urls: ['xxx'] },
12501280
maxRetry: 1
12511281
})
12521282
.then((res) => {})
@@ -1385,7 +1415,7 @@ const myXCrawl = xCrawl()
13851415
myXCrawl
13861416
.crawlFile({
13871417
url: 'https://www.example.com/file',
1388-
proxy: 'xxx',
1418+
proxy: { urls: ['xxx'] },
13891419
maxRetry: 1,
13901420
storeDir: './upload',
13911421
fileName: 'xxx'
@@ -1490,6 +1520,13 @@ export interface XCrawlConfig extends CrawlCommonConfig {
14901520
enableRandomFingerprint?: boolean
14911521
baseUrl?: string
14921522
intervalTime?: IntervalTime
1523+
log?:
1524+
| {
1525+
start?: boolean
1526+
process?: boolean
1527+
result?: boolean
1528+
}
1529+
| boolean
14931530
crawlPage?: {
14941531
puppeteerLaunch?: PuppeteerLaunchOptions // puppeteer
14951532
}
@@ -1502,6 +1539,7 @@ export interface XCrawlConfig extends CrawlCommonConfig {
15021539
- enableRandomFingerprint: true
15031540
- baseUrl: undefined
15041541
- intervalTime: undefined
1542+
- log: { start: true, process: true, result: true }
15051543
- crawlPage: undefined
15061544
15071545
#### Detail target config
@@ -2042,10 +2080,16 @@ The crawlPage API has built-in [puppeteer](https://github.com/puppeteer/puppetee
20422080
20432081
- **GitHub Discussions:** Use [GitHub Discussions](https://github.com/coder-hxl/x-crawl/discussions) for message board-style questions and discussions.
20442082
2083+
Questions and discussions related to illegal activities may not be submitted. x-crawl is for legal use only. It is prohibited to use this tool to conduct any illegal activities, including but not limited to unauthorized data collection, cyber attacks, privacy invasion, etc.
2084+
20452085
### Issues
20462086
20472087
If you have questions, needs, or good suggestions, you can raise them at [GitHub Issues](https://github.com/coder-hxl/x-crawl/issues).
20482088
20492089
### Sponsor
20502090
20512091
x-crawl is an open source project under the MIT license, completely free to use. If you benefit from the projects I develop and maintain at work, please consider supporting my work through the [Afdian](https://afdian.net/a/coderhxl) platform.
2092+
2093+
### Special Instructions
2094+
2095+
x-crawl is for legal use only. It is prohibited to use this tool to conduct any illegal activities, including but not limited to unauthorized data collection, cyber attacks, privacy invasion, etc.

publish/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "x-crawl",
3-
"version": "8.2.0",
3+
"version": "8.3.0",
44
"author": "coderHXL",
55
"description": "x-crawl is a flexible Node.js multifunctional crawler library.",
66
"license": "MIT",

0 commit comments

Comments
 (0)