You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md)
4
4
5
-
x-crawl is a flexible Node.js multifunctional crawler library. Used to crawl pages, crawl interfaces, crawl files, and poll crawls.
5
+
x-crawl is a flexible Node.js multipurpose crawler library. The usage is flexible, and there are many built-in functions for crawl pages, crawl interfaces, crawl files, etc.
6
6
7
7
> If you also like x-crawl, you can give [x-crawl repository](https://github.com/coder-hxl/x-crawl) a star to support it, thank you for your support!
8
8
9
9
## Features
10
10
11
-
-**🔥 AsyncSync** - Just change the mode attribute value to switch async or sync crawling mode.
12
-
-**⚙️Multiple functions** - It can crawl pages, crawl interfaces, crawl files and polling crawls, and supports crawling single or multiple.
13
-
-**🖋️ Flexible writing style** - Simple target configuration, detailed target configuration, mixed target array configuration and advanced configuration, the same crawling API can adapt to multiple configurations.
14
-
-**👀Device Fingerprinting** - Zero configuration or custom configuration to avoid fingerprinting to identify and track us from different locations.
15
-
-**⏱️ Interval Crawling** - No interval, fixed interval and random interval can generate or avoid high concurrent crawling.
16
-
-**🔄 Retry on failure** - Global settings, local settings and individual settings, It can avoid crawling failure caused by temporary problems.
11
+
-**🔥 Asynchronous Synchronous** - Just change the mode property to toggle asynchronous or synchronous crawling mode.
12
+
-**⚙️Multiple purposes** - It can crawl pages, crawl interfaces, crawl files and poll crawls to meet the needs of various scenarios.
13
+
-**🖋️ Flexible writing style** - The same crawling API can be adapted to multiple configurations, and each configuration method is very unique.
14
+
-**👀Device Fingerprinting** - Zero configuration or custom configuration, avoid fingerprinting to identify and track us from different locations.
15
+
-**⏱️ Interval Crawling** - No interval, fixed interval and random interval to generate or avoid high concurrent crawling.
16
+
-**🔄 Failed Retry** - Avoid crawling failure due to transient problems, unlimited retries.
17
17
-**🚀 Priority Queue** - According to the priority of a single crawling target, it can be crawled ahead of other targets.
18
18
-**☁️ Crawl SPA** - Crawl SPA (Single Page Application) to generate pre-rendered content (aka "SSR" (Server Side Rendering)).
19
-
-**⚒️ Controlling Pages** - Headless browsers can submit forms, keystrokes, event actions, generate screenshots of pages, etc.
20
-
-**🧾 Capture Record** - Capture and record crawling results and other information, and highlight reminders on the console.
19
+
-**⚒️ Control Page** - You can submit form, keyboard input, event operation, generate screenshots of the page, etc.
20
+
-**🧾 Capture Record** - Capture and record the crawled information, and highlight it on the console.
21
21
-**🦾 TypeScript** - Own types, implement complete types through generics.
22
22
23
23
## Relationship with Puppeteer
@@ -499,9 +499,9 @@ myXCrawl
499
499
url:'https://www.example.com/page-2',
500
500
fingerprint: {
501
501
maxWidth:1980,
502
-
minWidth:1980,
502
+
minWidth:1200,
503
503
maxHeight:1080,
504
-
minHidth:1080,
504
+
minHidth:800,
505
505
platform:'Android'
506
506
}
507
507
}
@@ -589,9 +589,16 @@ The larger the value of the priority attribute, the higher the priority in the c
589
589
590
590
### About Results
591
591
592
-
For the result, the result of each crawl target is uniformly wrapped with an object that provides information about the result of the crawl target, such as id, result, success or not, maximum retry, number of retries, error information collected, and so on. Automatically determine whether the return value is wrapped in an array depending on the configuration you choose, and the type fits perfectly in TS.
592
+
Each crawl target will generate a detail object, which will contain the following properties:
593
593
594
-
The id of each object is determined according to the order of crawl targets in your configuration, and if there is a priority used, it will be sorted by priority.
594
+
- id: Generated according to the order of crawling targets, if there is a priority, it will be generated according to the priority
595
+
- isSuccess: Whether to crawl successfully
596
+
- maxRetry: The maximum number of retries for this crawling target
597
+
- retryCount: The number of times the crawling target has been retried
598
+
- crawlErrorQueue: Error collection of the crawl target
599
+
- data: the crawling data of the crawling target
600
+
601
+
If it is a specific configuration, it will automatically determine whether the details object is stored in an array according to the configuration method you choose, and return the array, otherwise return the details object. Already fits types perfectly in TypeScript.
595
602
596
603
Details about configuration methods and results are as follows: [crawlPage config](#config), [crawlData config](#config-1), [crawlFile config](#config-2).
0 commit comments