@@ -18,7 +18,7 @@ If it helps you, please give the [repository](https://github.com/coder-hxl/x-cra
18
18
19
19
## Relationship with puppeteer
20
20
21
- The fetchPage API internally uses the [ puppeteer] ( https://github.com/puppeteer/puppeteer ) library to crawl pages.
21
+ The crawlPage API internally uses the [ puppeteer] ( https://github.com/puppeteer/puppeteer ) library to crawl pages.
22
22
23
23
The following can be done:
24
24
@@ -45,17 +45,17 @@ The following can be done:
45
45
+ [ Example] ( #Example-1 )
46
46
+ [ Mode] ( #Mode )
47
47
+ [ IntervalTime] ( #IntervalTime )
48
- * [ fetchPage ] ( #fetchPage )
48
+ * [ crawlPage ] ( #crawlPage )
49
49
+ [ Type] ( #Type-2 )
50
50
+ [ Example] ( #Example-2 )
51
51
+ [ About page] ( #About-page )
52
- * [ fetchData ] ( #fetchData )
52
+ * [ crawlData ] ( #crawlData )
53
53
+ [ Type] ( #Type-3 )
54
54
+ [ Example] ( #Example-3 )
55
- * [ fetchFile ] ( #fetchFile )
55
+ * [ crawlFile ] ( #crawlFile )
56
56
+ [ Type] ( #Type-4 )
57
57
+ [ Example] ( #Example-4 )
58
- * [ fetchPolling ] ( #fetchPolling )
58
+ * [ crawlPolling ] ( #crawlPolling )
59
59
+ [ Type] ( #Type-5 )
60
60
+ [ Example] ( #Example-5 )
61
61
- [ Types] ( #Types )
@@ -65,15 +65,15 @@ The following can be done:
65
65
* [ RequestConfig] ( #RequestConfig )
66
66
* [ IntervalTime] ( #IntervalTime )
67
67
* [ XCrawlBaseConfig] ( #XCrawlBaseConfig )
68
- * [ FetchBaseConfigV1 ] ( #FetchBaseConfigV1 )
69
- * [ FetchPageConfig ] ( #FetchPageConfig )
70
- * [ FetchDataConfig ] ( #FetchDataConfig )
71
- * [ FetchFileConfig ] ( #FetchFileConfig )
68
+ * [ CrawlBaseConfigV1 ] ( #CrawlBaseConfigV1 )
69
+ * [ CrawlPageConfig ] ( #CrawlPageConfig )
70
+ * [ CrawlDataConfig ] ( #CrawlDataConfig )
71
+ * [ CrawlFileConfig ] ( #CrawlFileConfig )
72
72
* [ StartPollingConfig] ( #StartPollingConfig )
73
- * [ FetchResCommonV1 ] ( #FetchResCommonV1 )
74
- * [ FetchResCommonArrV1 ] ( #FetchResCommonArrV1 )
73
+ * [ CrawlResCommonV1 ] ( #CrawlResCommonV1 )
74
+ * [ CrawlResCommonArrV1 ] ( #CrawlResCommonArrV1 )
75
75
* [ FileInfo] ( #FileInfo )
76
- * [ FetchPage ] ( #FetchPage )
76
+ * [ CrawlPage ] ( #CrawlPage )
77
77
- [ More] ( #More )
78
78
79
79
## Install
@@ -101,8 +101,8 @@ const myXCrawl = xCrawl({
101
101
// 3.Set the crawling task
102
102
// Call the startPolling API to start the polling function, and the callback function will be called every other day
103
103
myXCrawl .startPolling ({ d: 1 }, () => {
104
- // Call fetchPage API to crawl Page
105
- myXCrawl .fetchPage (' https://www.youtube.com/' ).then ((res ) => {
104
+ // Call crawlPage API to crawl Page
105
+ myXCrawl .crawlPage (' https://www.youtube.com/' ).then ((res ) => {
106
106
const { jsdom } = res .data // By default, the JSDOM library is used to parse Page
107
107
108
108
// Get the cover image element of the Promoted Video
@@ -118,8 +118,8 @@ myXCrawl.startPolling({ d: 1 }, () => {
118
118
}
119
119
})
120
120
121
- // Call the fetchFile API to crawl pictures
122
- myXCrawl .fetchFile ({ requestConfig, fileConfig: { storeDir: ' ./upload' } })
121
+ // Call the crawlFile API to crawl pictures
122
+ myXCrawl .crawlFile ({ requestConfig, fileConfig: { storeDir: ' ./upload' } })
123
123
})
124
124
})
125
125
@@ -209,17 +209,17 @@ const myXCrawl2 = xCrawl({
209
209
210
210
### Crawl page
211
211
212
- Fetch a page via [ fetchPage ()] ( #fetchPage )
212
+ Crawl a page via [ crawlPage ()] ( #crawlPage )
213
213
214
214
``` js
215
- myXCrawl .fetchPage (' https://xxx.com' ).then (res => {
215
+ myXCrawl .crawlPage (' https://xxx.com' ).then (res => {
216
216
const { jsdom , page } = res .data
217
217
})
218
218
```
219
219
220
220
### Crawl interface
221
221
222
- Crawl interface data through [ fetchData ()] ( #fetchData )
222
+ Crawl interface data through [ crawlData ()] ( #crawlData )
223
223
224
224
``` js
225
225
const requestConfig = [
@@ -228,14 +228,14 @@ const requestConfig = [
228
228
{ url: ' https://xxx.com/xxxx' }
229
229
]
230
230
231
- myXCrawl .fetchData ({ requestConfig }).then (res => {
231
+ myXCrawl .crawlData ({ requestConfig }).then (res => {
232
232
// deal with
233
233
})
234
234
```
235
235
236
236
### Crawl files
237
237
238
- Fetch file data via [ fetchFile ()] ( #fetchFile )
238
+ Crawl file data via [ crawlFile ()] ( #crawlFile )
239
239
240
240
``` js
241
241
import path from ' node:path'
@@ -246,7 +246,7 @@ const requestConfig = [
246
246
{ url: ' https://xxx.com/xxxx' }
247
247
]
248
248
249
- myXCrawl . fetchFile ({
249
+ myXCrawl . crawlFile ({
250
250
requestConfig,
251
251
fileConfig: {
252
252
storeDir: path .resolve (__dirname , ' ./upload' ) // storage folder
@@ -284,9 +284,9 @@ const myXCrawl = xCrawl({
284
284
})
285
285
` ` `
286
286
287
- Passing ** baseConfig ** is for ** fetchPage / fetchData / fetchFile ** to use these values by default .
287
+ Passing ** baseConfig ** is for ** crawlPage / crawlData / crawlFile ** to use these values by default .
288
288
289
- ** Note :** To avoid repeated creation of instances in subsequent examples , ** myXCrawl ** here will be the crawler instance in the ** fetchPage / fetchData / fetchFile ** example .
289
+ ** Note :** To avoid repeated creation of instances in subsequent examples , ** myXCrawl ** here will be the crawler instance in the ** crawlPage / crawlData / crawlFile ** example .
290
290
291
291
#### Mode
292
292
@@ -306,26 +306,26 @@ The intervalTime option defaults to undefined . If there is a setting value, it
306
306
307
307
The first request is not to trigger the interval .
308
308
309
- ### fetchPage
309
+ ### crawlPage
310
310
311
- fetchPage is the method of the above [myXCrawl ](https :// github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl page.
311
+ crawlPage is the method of the above [myXCrawl ](https :// github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl page.
312
312
313
313
#### Type
314
314
315
- - Look at the [FetchPageConfig ](#FetchPageConfig ) type
316
- - Look at the [FetchPage ](#FetchPage - 2 ) type
315
+ - Look at the [CrawlPageConfig ](#CrawlPageConfig ) type
316
+ - Look at the [CrawlPage ](#CrawlPage - 2 ) type
317
317
318
318
` ` ` ts
319
- function fetchPage : (
320
- config: FetchPageConfig ,
321
- callback?: (res: FetchPage ) => void
322
- ) => Promise<FetchPage >
319
+ function crawlPage : (
320
+ config: CrawlPageConfig ,
321
+ callback?: (res: CrawlPage ) => void
322
+ ) => Promise<CrawlPage >
323
323
` ` `
324
324
325
325
#### Example
326
326
327
327
` ` ` js
328
- myXCrawl.fetchPage ('/xxx').then((res) => {
328
+ myXCrawl.crawlPage ('/xxx').then((res) => {
329
329
const { jsdom } = res.data
330
330
console.log(jsdom.window.document.querySelector('title')?.textContent)
331
331
})
@@ -335,21 +335,21 @@ myXCrawl.fetchPage('/xxx').then((res) => {
335
335
336
336
Get the page instance from res .data .page , which can do interactive operations such as events . For specific usage , refer to [page ](https :// pptr.dev/api/puppeteer.page).
337
337
338
- ### fetchData
338
+ ### crawlData
339
339
340
- fetchData is the method of the above [myXCrawl ](#Example - 1 ) instance , which is usually used to crawl APIs to obtain JSON data and so on .
340
+ crawlData is the method of the above [myXCrawl ](#Example - 1 ) instance , which is usually used to crawl APIs to obtain JSON data and so on .
341
341
342
342
#### Type
343
343
344
- - Look at the [FetchDataConfig ](#FetchDataConfig ) type
345
- - Look at the [FetchResCommonV1 ](#FetchResCommonV1 ) type
346
- - Look at the [FetchResCommonArrV1 ](#FetchResCommonArrV1 ) type
344
+ - Look at the [CrawlDataConfig ](#CrawlDataConfig ) type
345
+ - Look at the [CrawlResCommonV1 ](#CrawlResCommonV1 ) type
346
+ - Look at the [CrawlResCommonArrV1 ](#CrawlResCommonArrV1 ) type
347
347
348
348
` ` ` ts
349
- function fetchData : <T = any>(
350
- config: FetchDataConfig ,
351
- callback?: (res: FetchResCommonV1 <T>) => void
352
- ) => Promise<FetchResCommonArrV1 <T>>
349
+ function crawlData : <T = any>(
350
+ config: CrawlDataConfig ,
351
+ callback?: (res: CrawlResCommonV1 <T>) => void
352
+ ) => Promise<CrawlResCommonArrV1 <T>>
353
353
` ` `
354
354
355
355
#### Example
@@ -361,27 +361,27 @@ const requestConfig = [
361
361
{ url: '/xxxx' }
362
362
]
363
363
364
- myXCrawl.fetchData ({ requestConfig }).then(res => {
364
+ myXCrawl.crawlData ({ requestConfig }).then(res => {
365
365
console.log(res)
366
366
})
367
367
` ` `
368
368
369
- ### fetchFile
369
+ ### crawlFile
370
370
371
- fetchFile is the method of the above [myXCrawl ](#Example - 1 ) instance , which is usually used to crawl files , such as pictures , pdf files , etc .
371
+ crawlFile is the method of the above [myXCrawl ](#Example - 1 ) instance , which is usually used to crawl files , such as pictures , pdf files , etc .
372
372
373
373
#### Type
374
374
375
- - Look at the [FetchFileConfig ](#FetchFileConfig ) type
376
- - Look at the [FetchResCommonV1 ](#FetchResCommonV1 ) type
377
- - Look at the [FetchResCommonArrV1 ](#FetchResCommonArrV1 ) type
375
+ - Look at the [CrawlFileConfig ](#CrawlFileConfig ) type
376
+ - Look at the [CrawlResCommonV1 ](#CrawlResCommonV1 ) type
377
+ - Look at the [CrawlResCommonArrV1 ](#CrawlResCommonArrV1 ) type
378
378
- Look at the [FileInfo ](#FileInfo ) type
379
379
380
380
` ` ` ts
381
- function fetchFile : (
382
- config: FetchFileConfig ,
383
- callback?: (res: FetchResCommonV1 <FileInfo>) => void
384
- ) => Promise<FetchResCommonArrV1 <FileInfo>>
381
+ function crawlFile : (
382
+ config: CrawlFileConfig ,
383
+ callback?: (res: CrawlResCommonV1 <FileInfo>) => void
384
+ ) => Promise<CrawlResCommonArrV1 <FileInfo>>
385
385
` ` `
386
386
387
387
#### Example
@@ -393,7 +393,7 @@ const requestConfig = [
393
393
{ url: '/xxxx' }
394
394
]
395
395
396
- myXCrawl.fetchFile ({
396
+ myXCrawl.crawlFile ({
397
397
requestConfig,
398
398
fileConfig: {
399
399
storeDir: path.resolve(__dirname, './upload') // storage folder
@@ -405,7 +405,7 @@ myXCrawl.fetchFile({
405
405
406
406
### startPolling
407
407
408
- fetchPolling is a method of the [myXCrawl ](#Example - 1 ) instance , typically used to perform polling operations , such as getting news every once in a while .
408
+ crawlPolling is a method of the [myXCrawl ](#Example - 1 ) instance , typically used to perform polling operations , such as getting news every once in a while .
409
409
410
410
#### Type
411
411
@@ -423,7 +423,7 @@ function startPolling(
423
423
` ` ` js
424
424
myXCrawl.startPolling({ h: 1, m: 30 }, () => {
425
425
// will be executed every one and a half hours
426
- // fetchPage/fetchData/fetchFile
426
+ // crawlPage/crawlData/crawlFile
427
427
})
428
428
` ` `
429
429
@@ -485,32 +485,32 @@ interface XCrawlBaseConfig {
485
485
}
486
486
` ` `
487
487
488
- ### FetchBaseConfigV1
488
+ ### CrawlBaseConfigV1
489
489
490
490
` ` ` ts
491
- interface FetchBaseConfigV1 {
491
+ interface CrawlBaseConfigV1 {
492
492
requestConfig: RequestConfig | RequestConfig[]
493
493
intervalTime?: IntervalTime
494
494
}
495
495
` ` `
496
496
497
- ### FetchPageConfig
497
+ ### CrawlPageConfig
498
498
499
499
` ` ` ts
500
- type FetchPageConfig = string | RequestBaseConfig
500
+ type CrawlPageConfig = string | RequestBaseConfig
501
501
` ` `
502
502
503
- ### FetchDataConfig
503
+ ### CrawlDataConfig
504
504
505
505
` ` ` ts
506
- interface FetchDataConfig extends FetchBaseConfigV1 {
506
+ interface CrawlDataConfig extends CrawlBaseConfigV1 {
507
507
}
508
508
` ` `
509
509
510
- ### FetchFileConfig
510
+ ### CrawlFileConfig
511
511
512
512
` ` ` ts
513
- interface FetchFileConfig extends FetchBaseConfigV1 {
513
+ interface CrawlFileConfig extends CrawlBaseConfigV1 {
514
514
fileConfig: {
515
515
storeDir: string // Store folder
516
516
extension?: string // Filename extension
@@ -528,21 +528,21 @@ interface StartPollingConfig {
528
528
}
529
529
` ` `
530
530
531
- ### FetchResCommonV1
531
+ ### CrawlResCommonV1
532
532
533
533
` ` ` ts
534
- interface FetchCommon <T> {
534
+ interface CrawlCommon <T> {
535
535
id: number
536
536
statusCode: number | undefined
537
537
headers: IncomingHttpHeaders // nodejs: http type
538
538
data: T
539
539
}
540
540
` ` `
541
541
542
- ### FetchResCommonArrV1
542
+ ### CrawlResCommonArrV1
543
543
544
544
` ` ` ts
545
- type FetchResCommonArrV1 <T> = FetchResCommonV1 <T>[]
545
+ type CrawlResCommonArrV1 <T> = CrawlResCommonV1 <T>[]
546
546
` ` `
547
547
548
548
### FileInfo
@@ -556,10 +556,10 @@ interface FileInfo {
556
556
}
557
557
` ` `
558
558
559
- ### FetchPage
559
+ ### CrawlPage
560
560
561
561
` ` ` ts
562
- interface FetchPage {
562
+ interface CrawlPage {
563
563
httpResponse: HTTPResponse | null // The type of HTTPResponse in the puppeteer library
564
564
data: {
565
565
page: Page // The type of Page in the puppeteer library
0 commit comments