@@ -70,26 +70,57 @@ npm install x-crawl
70
70
71
71
## Example
72
72
73
- Get the title of https://docs.github.com/zh/get-started as an example :
73
+ Example of fetching featured video cover image for youtube homepage every other day :
74
74
75
75
``` js
76
- // Import module ES/CJS
76
+ // 1. Import module ES/CJS
77
77
import xCrawl from ' x-crawl'
78
78
79
- // Create a crawler instance
80
- const docsXCrawl = xCrawl ({
81
- baseUrl: ' https://docs.github.com' ,
82
- timeout: 10000 ,
83
- intervalTime: { max: 2000 , min: 1000 }
79
+ // 2.Create a crawler instance
80
+ const myXCrawl = xCrawl ({
81
+ timeout: 10000 , // overtime time
82
+ intervalTime: { max: 3000 , min: 2000 } // control request frequency
84
83
})
85
84
86
- // Call fetchHTML API to crawl
87
- docsXCrawl .fetchHTML (' /zh/get-started' ).then ((res ) => {
88
- const { jsdom } = res .data
89
- console .log (jsdom .window .document .querySelector (' title' )? .textContent )
85
+ // 3.Set the crawling task
86
+ // Call the startPolling API to start the polling function, and the callback function will be called every other day
87
+ myXCrawl .startPolling ({ d: 1 }, () => {
88
+ // Call fetchHTML API to crawl HTML
89
+ myXCrawl .fetchHTML (' https://www.youtube.com/' ).then ((res ) => {
90
+ const { jsdom } = res .data // By default, the JSDOM library is used to parse HTML
91
+
92
+ // Get the cover image element of the Promoted Video
93
+ const imgEls = jsdom .window .document .querySelectorAll (
94
+ ' .yt-core-image--fill-parent-width'
95
+ )
96
+
97
+ // set request configuration
98
+ const requestConfig = []
99
+ imgEls .forEach ((item ) => {
100
+ if (item .src ) {
101
+ requestConfig .push ({ url: item .src })
102
+ }
103
+ })
104
+
105
+ // Call the fetchFile API to crawl pictures
106
+ myXCrawl .fetchFile ({ requestConfig, fileConfig: { storeDir: ' ./upload' } })
107
+ })
90
108
})
109
+
91
110
```
92
111
112
+ running result:
113
+
114
+ <div align =" center " >
115
+ <img src =" https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/en/crawler.png " />
116
+ </div >
117
+
118
+ <div align =" center " >
119
+ <img src =" https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/en/crawler-result.png " />
120
+ </div >
121
+
122
+ ** Note:** Do not crawl randomly, here is just to demonstrate how to use XCrawl, and control the request frequency within 3000ms to 2000ms.
123
+
93
124
## Core concepts
94
125
95
126
### x-crawl
0 commit comments