Skip to content

Commit 3636ae0

Browse files
committed
Merge branch 'main' of https://github.com/coder-hxl/x-crawl
2 parents 3cf4adc + d895e27 commit 3636ae0

File tree

2 files changed

+44
-45
lines changed

2 files changed

+44
-45
lines changed

README.md

Lines changed: 22 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -62,25 +62,25 @@ The crawlPage API has [puppeteer](https://github.com/puppeteer/puppeteer) built
6262
- [Type](#Type-1)
6363
- [Example](#Example-2)
6464
- [Config](#Config)
65-
- [1.Simple target config - string](#1.Simple-target-config---string)
66-
- [2.Detailed target config - CrawlPageDetailTargetConfig](#2.Detailed-target-config---CrawlPageDetailTargetConfig)
67-
- [3.Mixed target array config - (string | CrawlPageDetailTargetConfig)[]](<#3.Mixed-target-array-config---(string-|-CrawlPageDetailTargetConfig)[]>)
68-
- [4.Advanced config - CrawlPageAdvancedConfig](#4.Advanced-config---CrawlPageAdvancedConfig)
65+
- [Simple target config - string](#Simple-target-config---string)
66+
- [Detailed target config - CrawlPageDetailTargetConfig](#Detailed-target-config---CrawlPageDetailTargetConfig)
67+
- [Mixed target array config - (string | CrawlPageDetailTargetConfig)[]](#Mixed-target-array-config---string--CrawlPageDetailTargetConfig)
68+
- [Advanced config - CrawlPageAdvancedConfig](#Advanced-config---CrawlPageAdvancedConfig)
6969
- [crawlData](#crawlData)
7070
- [Type](#Type-2)
7171
- [Example](#Example-3)
7272
- [Config](#Config-1)
73-
- [1.Simple target config - string](#1.Simple-target-config---string-1)
74-
- [2.Detailed target config - CrawlDataDetailTargetConfig](#2.Detailed-target-config---CrawlDataDetailTargetConfig)
75-
- [3.Mixed target array config - (string | CrawlDataDetailTargetConfig)[]](<#3.Mixed-target-array-config---(string-|-CrawlDataDetailTargetConfig)[]>)
76-
- [4.Advanced config - CrawlDataAdvancedConfig](#4.Advanced-config---CrawlDataAdvancedConfig)
73+
- [Simple target config - string](#Simple-target-config---string-1)
74+
- [Detailed target config - CrawlDataDetailTargetConfig](#Detailed-target-config---CrawlDataDetailTargetConfig)
75+
- [Mixed target array config - (string | CrawlDataDetailTargetConfig)[]](#Mixed-target-array-config---string--CrawlDataDetailTargetConfig)
76+
- [Advanced config - CrawlDataAdvancedConfig](#Advanced-config---CrawlDataAdvancedConfig)
7777
- [crawlFile](#crawlFile)
7878
- [Type](#Type-3)
7979
- [Example](#Example-4)
8080
- [Config](#Config-2)
81-
- [1.Detailed target config - CrawlFileDetailTargetConfig](#1.Detailed-target-config---CrawlFileDetailTargetConfig)
82-
- [2.Detailed target array config - CrawlFileDetailTargetConfig[]](#2.Detailed-target-array-config---CrawlFileDetailTargetConfig[])
83-
- [3.Advanced config - CrawlFileAdvancedConfig](#3.Advanced-config---CrawlFileAdvancedConfig)
81+
- [Detailed target config - CrawlFileDetailTargetConfig](#Detailed-target-config---CrawlFileDetailTargetConfig)
82+
- [Detailed target array config - CrawlFileDetailTargetConfig[]](#Detailed-target-array-config---CrawlFileDetailTargetConfig)
83+
- [Advanced config - CrawlFileAdvancedConfig](#Advanced-config-CrawlFileAdvancedConfig)
8484
- [crawlPolling](#crawlPolling)
8585
- [Type](#Type-4)
8686
- [Example](#Example-5)
@@ -706,7 +706,7 @@ There are 4 types:
706706
- Mixed target array config - (string | CrawlPageDetailTargetConfig)[]
707707
- Advanced config - CrawlPageAdvancedConfig
708708
709-
##### 1.Simple target config - string
709+
##### Simple target config - string
710710
711711
This is a simple target configuration. if you just want to simply crawl this page, you can try this way of writing:
712712
@@ -720,7 +720,7 @@ myXCrawl.crawlPage('https://www.example.com').then((res) => {})
720720
721721
The res you get will be an object.
722722
723-
##### 2.Detailed target config - CrawlPageDetailTargetConfig
723+
##### Detailed target config - CrawlPageDetailTargetConfig
724724
725725
This is the detailed target configuration. if you want to crawl this page and need to retry on failure, you can try this way of writing:
726726
@@ -742,7 +742,7 @@ The res you get will be an object.
742742
743743
More configuration options can view [CrawlPageDetailTargetConfig](#CrawlPageDetailTargetConfig).
744744
745-
##### 3.Mixed target array config - (string | CrawlPageDetailTargetConfig)[]
745+
##### Mixed target array config - (string | CrawlPageDetailTargetConfig)[]
746746
747747
This is a mixed target array configuration. if you want to crawl multiple pages, and some pages need to fail and retry, you can try this way of writing:
748748
@@ -763,7 +763,7 @@ The res you get will be an array of objects.
763763
764764
More configuration options can view [CrawlPageDetailTargetConfig](#CrawlPageDetailTargetConfig).
765765
766-
##### 4.Advanced config - CrawlPageAdvancedConfig
766+
##### Advanced config - CrawlPageAdvancedConfig
767767
768768
This is an advanced configuration, targets is a mixed target array configuration. if you want to crawl multiple pages and request configurations (proxy, cookies, retries, etc.) that you don't want to write repeatedly, but also need interval time, device fingerprint, lifecycle, etc., try this:
769769
@@ -863,7 +863,7 @@ There are 4 types:
863863
- Mixed target array config - (string | CrawlDataDetailTargetConfig)[]
864864
- Advanced config - CrawlDataAdvancedConfig
865865
866-
##### 1.Simple target config - string
866+
##### Simple target config - string
867867
868868
This is a simple target configuration. if you just want to simply crawl the data, and the interface is GET, you can try this way of writing:
869869
@@ -877,7 +877,7 @@ myXCrawl.crawlData('https://www.example.com/api').then((res) => {})
877877
878878
The res you get will be an object.
879879
880-
##### 2.Detailed target config - CrawlDataDetailTargetConfig
880+
##### Detailed target config - CrawlDataDetailTargetConfig
881881
882882
This is the detailed target configuration. if you want to crawl this data and need to retry on failure, you can try this way of writing:
883883
@@ -899,7 +899,7 @@ The res you get will be an object.
899899
900900
More configuration options can view [CrawlDataDetailTargetConfig](#CrawlDataDetailTargetConfig).
901901
902-
##### 3.Mixed target array config - (string | CrawlDataDetailTargetConfig)[]
902+
##### Mixed target array config - (string | CrawlDataDetailTargetConfig)[]
903903
904904
This is a mixed target array configuration. if you want to crawl multiple data, and some data needs to fail and retry, you can try this way of writing:
905905
@@ -920,7 +920,7 @@ The res you get will be an array of objects.
920920
921921
More configuration options can view [CrawlDataDetailTargetConfig](#CrawlDataDetailTargetConfig).
922922
923-
##### 4.Advanced config - CrawlDataAdvancedConfig
923+
##### Advanced config - CrawlDataAdvancedConfig
924924
925925
This is an advanced configuration, targets is a mixed target array configuration. if you want to crawl more than one piece of data and request configurations (proxy, cookies, retries, etc.) don't want to write twice, but also need interval time, device fingerprint, lifecycle, etc., try this:
926926
@@ -1013,11 +1013,10 @@ myXCrawl
10131013
There are 3 types:
10141014
10151015
- Detailed target config - CrawlFileDetailTargetConfig
1016-
10171016
- Detailed target array config - CrawlFileDetailTargetConfig[]
10181017
- Advanced config CrawlFileAdvancedConfig
10191018
1020-
##### 1.Detailed target config - CrawlFileDetailTargetConfig
1019+
##### Detailed target config - CrawlFileDetailTargetConfig
10211020
10221021
This is the detailed target configuration. if you want to crawl this file and need to retry on failure, you can try this way of writing:
10231022
@@ -1041,7 +1040,7 @@ The res you get will be an object.
10411040
10421041
More configuration options can view [CrawlFileDetailTargetConfig](#CrawlFileDetailTargetConfig).
10431042
1044-
##### 2.Detailed target array config - CrawlFileDetailTargetConfig[]
1043+
##### Detailed target array config - CrawlFileDetailTargetConfig[]
10451044
10461045
This is the detailed target array configuration. if you want to crawl multiple files, and some data needs to be retried after failure, you can try this way of writing:
10471046
@@ -1062,7 +1061,7 @@ The res you get will be an array of objects.
10621061
10631062
More configuration options can view [CrawlFileDetailTargetConfig](#CrawlFileDetailTargetConfig).
10641063
1065-
##### 3.Advanced config CrawlFileAdvancedConfig
1064+
##### Advanced config CrawlFileAdvancedConfig
10661065
10671066
This is an advanced configuration, targets is a mixed target array configuration. if you want to crawl more than one piece of data and request configurations (proxy, storeDir, retry, etc.) don't want to write twice, but also need interval time, device fingerprint, life cycle, etc., try this:
10681067

docs/cn.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -62,25 +62,25 @@ crawlPage API 内置了 [puppeteer](https://github.com/puppeteer/puppeteer) ,
6262
- [类型](#类型-1)
6363
- [示例](#示例-2)
6464
- [配置](#配置)
65-
- [1.简单目标配置 - string](#1.简单目标配置---string)
66-
- [2.详细目标配置 - CrawlPageDetailTargetConfig](#2.详细目标配置---CrawlPageDetailTargetConfig)
67-
- [3.混合目标数组配置 - (string | CrawlPageDetailTargetConfig)[]](<#3.混合目标数组配置---(string-|-CrawlPageDetailTargetConfig)[]>)
68-
- [4.进阶配置 - CrawlPageAdvancedConfig](#4.进阶配置---CrawlPageAdvancedConfig)
65+
- [简单目标配置 - string](#简单目标配置---string)
66+
- [详细目标配置 - CrawlPageDetailTargetConfig](#详细目标配置---CrawlPageDetailTargetConfig)
67+
- [混合目标数组配置 - (string | CrawlPageDetailTargetConfig)[]](#混合目标数组配置---string--CrawlPageDetailTargetConfig)
68+
- [进阶配置 - CrawlPageAdvancedConfig](#进阶配置---CrawlPageAdvancedConfig)
6969
- [crawlData](#crawlData)
7070
- [类型](#类型-2)
7171
- [示例](#示例-3)
7272
- [配置](#配置-1)
73-
- [1.简单目标配置 - string](#1.简单目标配置---string-1)
74-
- [2.详细目标配置 - CrawlDataDetailTargetConfig](#2.详细目标配置---CrawlDataDetailTargetConfig)
75-
- [3.混合目标数组配置 - (string | CrawlDataDetailTargetConfig)[]](<#3.混合目标数组配置---(string-|-CrawlDataDetailTargetConfig)[]>)
76-
- [4.进阶配置 - CrawlDataAdvancedConfig](#4.进阶配置---CrawlDataAdvancedConfig)
73+
- [简单目标配置 - string](#简单目标配置---string-1)
74+
- [详细目标配置 - CrawlDataDetailTargetConfig](#详细目标配置---CrawlDataDetailTargetConfig)
75+
- [混合目标数组配置 - (string | CrawlDataDetailTargetConfig)[]](#混合目标数组配置---string--CrawlDataDetailTargetConfig)
76+
- [进阶配置 - CrawlDataAdvancedConfig](#进阶配置---CrawlDataAdvancedConfig)
7777
- [crawlFile](#crawlFile)
7878
- [类型](#类型-3)
7979
- [示例](#示例-4)
8080
- [配置](#配置-2)
81-
- [1.详细目标配置 - CrawlFileDetailTargetConfig](#1.详细目标配置---CrawlFileDetailTargetConfig)
82-
- [2.详细目标数组配置 - CrawlFileDetailTargetConfig[]](2.详细目标数组配置---CrawlFileDetailTargetConfig[])
83-
- [3.进阶配置 - CrawlFileAdvancedConfig](#3.进阶配置---CrawlFileAdvancedConfig)
81+
- [详细目标配置 - CrawlFileDetailTargetConfig](#详细目标配置---CrawlFileDetailTargetConfig)
82+
- [详细目标数组配置 - CrawlFileDetailTargetConfig[]](#详细目标数组配置---CrawlFileDetailTargetConfig)
83+
- [进阶配置 - CrawlFileAdvancedConfig](#进阶配置---CrawlFileAdvancedConfig)
8484
- [startPolling](#startPolling)
8585
- [类型](#类型-4)
8686
- [示例](#示例-5)
@@ -694,7 +694,7 @@ myXCrawl.crawlPage('https://www.example.com').then((res) => {
694694
- 混合目标数组配置 - (string | CrawlPageDetailTargetConfig)[]
695695
- 进阶配置 - CrawlPageAdvancedConfig
696696
697-
##### 1.简单目标配置 - string
697+
##### 简单目标配置 - string
698698
699699
这是简单目标配置。如果你只想单纯爬一下这个页面,可以试试这种写法:
700700
@@ -708,7 +708,7 @@ myXCrawl.crawlPage('https://www.example.com').then((res) => {})
708708
709709
拿到的 res 将是一个对象。
710710
711-
##### 2.详细目标配置 - CrawlPageDetailTargetConfig
711+
##### 详细目标配置 - CrawlPageDetailTargetConfig
712712
713713
这是详细目标配置。如果你想爬一下这个页面,并且需要失败重试之类的,可以试试这种写法:
714714
@@ -730,7 +730,7 @@ myXCrawl
730730
731731
更多配置选项可以查看 [CrawlPageDetailTargetConfig](#CrawlPageDetailTargetConfig) 。
732732
733-
##### 3.混合目标数组配置 - (string | CrawlPageDetailTargetConfig)[]
733+
##### 混合目标数组配置 - (string | CrawlPageDetailTargetConfig)[]
734734
735735
这是混合目标数组配置。如果你想爬取多个页面,并且有些页面需要失败重试之类的,可以试试这种写法:
736736
@@ -751,7 +751,7 @@ myXCrawl
751751
752752
更多配置选项可以查看 [CrawlPageDetailTargetConfig](#CrawlPageDetailTargetConfig) 。
753753
754-
##### 4.进阶配置 - CrawlPageAdvancedConfig
754+
##### 进阶配置 - CrawlPageAdvancedConfig
755755
756756
这是进阶配置,targets 是混合目标数组配置。如果你想爬取多个页面,并且请求配置(proxy、cookies、重试等等)不想重复写,还需要间隔时间、设备指纹以及生命周期等等,可以试试这种写法:
757757
@@ -852,7 +852,7 @@ myXCrawl
852852
- 混合目标数组配置 - (string | CrawlDataDetailTargetConfig)[]
853853
- 进阶配置 - CrawlDataAdvancedConfig
854854
855-
##### 1.简单目标配置 - string
855+
##### 简单目标配置 - string
856856
857857
这是简单目标配置。如果你只想单纯爬一下这个数据,并且该接口是 GET 方式的,可以试试这种写法:
858858
@@ -866,7 +866,7 @@ myXCrawl.crawlData('https://www.example.com/api').then((res) => {})
866866
867867
拿到的 res 将是一个对象。
868868
869-
##### 2.详细目标配置 - CrawlDataDetailTargetConfig
869+
##### 详细目标配置 - CrawlDataDetailTargetConfig
870870
871871
这是详细目标配置。如果你想爬一下这个数据,并且需要失败重试之类的,可以试试这种写法:
872872
@@ -888,7 +888,7 @@ myXCrawl
888888
889889
更多配置选项可以查看 [CrawlDataDetailTargetConfig](#CrawlDataDetailTargetConfig) 。
890890
891-
##### 3.混合目标数组配置 - (string | CrawlDataDetailTargetConfig)[]
891+
##### 混合目标数组配置 - (string | CrawlDataDetailTargetConfig)[]
892892
893893
这是混合目标数组配置。如果你想爬取多个数据,并且有些数据需要失败重试之类的,可以试试这种写法:
894894
@@ -909,7 +909,7 @@ myXCrawl
909909
910910
更多配置选项可以查看 [CrawlDataDetailTargetConfig](#CrawlDataDetailTargetConfig) 。
911911
912-
##### 4.进阶配置 - CrawlDataAdvancedConfig
912+
##### 进阶配置 - CrawlDataAdvancedConfig
913913
914914
这是进阶配置,targets 是混合目标数组配置。如果你想爬取多个数据,并且请求配置(proxy、cookies、重试等等)不想重复写,还需要间隔时间、设备指纹以及生命周期等等,可以试试这种写法:
915915
@@ -1005,7 +1005,7 @@ myXCrawl
10051005
- 详细目标数组配置 - CrawlFileDetailTargetConfig[]
10061006
- 进阶配置 - CrawlFileAdvancedConfig
10071007
1008-
##### 1.详细目标配置 - CrawlFileDetailTargetConfig
1008+
##### 详细目标配置 - CrawlFileDetailTargetConfig
10091009
10101010
这是详细目标配置。如果你想爬一下这个文件,并且需要失败重试之类的,可以试试这种写法:
10111011
@@ -1029,7 +1029,7 @@ myXCrawl
10291029
10301030
更多配置选项可以查看 [CrawlFileDetailTargetConfig](#CrawlFileDetailTargetConfig) 。
10311031
1032-
##### 2.详细目标数组配置 - CrawlFileDetailTargetConfig[]
1032+
##### 详细目标数组配置 - CrawlFileDetailTargetConfig[]
10331033
10341034
这是详细目标数组配置。如果你想爬取多个文件,并且有些数据需要失败重试之类的,可以试试这种写法:
10351035
@@ -1050,7 +1050,7 @@ myXCrawl
10501050
10511051
更多配置选项可以查看 [CrawlFileDetailTargetConfig](#CrawlFileDetailTargetConfig) 。
10521052
1053-
##### 3.进阶配置 - CrawlFileAdvancedConfig
1053+
##### 进阶配置 - CrawlFileAdvancedConfig
10541054
10551055
这是进阶配置,targets 是混合目标数组配置。如果你想爬取多个数据,并且请求配置(proxy、storeDir、重试等等)不想重复写,还需要间隔时间、设备指纹以及生命周期等等,可以试试这种写法:
10561056

0 commit comments

Comments
 (0)