Skip to content

Commit 7933cf7

Browse files
committed
docs: update
1 parent ac35805 commit 7933cf7

File tree

1 file changed

+42
-11
lines changed

1 file changed

+42
-11
lines changed

publish/README.md

Lines changed: 42 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -70,26 +70,57 @@ npm install x-crawl
7070

7171
## Example
7272

73-
Get the title of https://docs.github.com/zh/get-started as an example:
73+
Example of fetching featured video cover image for youtube homepage every other day:
7474

7575
```js
76-
// Import module ES/CJS
76+
// 1.Import module ES/CJS
7777
import xCrawl from 'x-crawl'
7878

79-
// Create a crawler instance
80-
const docsXCrawl = xCrawl({
81-
baseUrl: 'https://docs.github.com',
82-
timeout: 10000,
83-
intervalTime: { max: 2000, min: 1000 }
79+
// 2.Create a crawler instance
80+
const myXCrawl = xCrawl({
81+
timeout: 10000, // overtime time
82+
intervalTime: { max: 3000, min: 2000 } // control request frequency
8483
})
8584

86-
// Call fetchHTML API to crawl
87-
docsXCrawl.fetchHTML('/zh/get-started').then((res) => {
88-
const { jsdom } = res.data
89-
console.log(jsdom.window.document.querySelector('title')?.textContent)
85+
// 3.Set the crawling task
86+
// Call the startPolling API to start the polling function, and the callback function will be called every other day
87+
myXCrawl.startPolling({ d: 1 }, () => {
88+
// Call fetchHTML API to crawl HTML
89+
myXCrawl.fetchHTML('https://www.youtube.com/').then((res) => {
90+
const { jsdom } = res.data // By default, the JSDOM library is used to parse HTML
91+
92+
// Get the cover image element of the Promoted Video
93+
const imgEls = jsdom.window.document.querySelectorAll(
94+
'.yt-core-image--fill-parent-width'
95+
)
96+
97+
// set request configuration
98+
const requestConfig = []
99+
imgEls.forEach((item) => {
100+
if (item.src) {
101+
requestConfig.push({ url: item.src })
102+
}
103+
})
104+
105+
// Call the fetchFile API to crawl pictures
106+
myXCrawl.fetchFile({ requestConfig, fileConfig: { storeDir: './upload' } })
107+
})
90108
})
109+
91110
```
92111

112+
running result:
113+
114+
<div align="center">
115+
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/en/crawler.png" />
116+
</div>
117+
118+
<div align="center">
119+
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/en/crawler-result.png" />
120+
</div>
121+
122+
**Note:** Do not crawl randomly, here is just to demonstrate how to use XCrawl, and control the request frequency within 3000ms to 2000ms.
123+
93124
## Core concepts
94125

95126
### x-crawl

0 commit comments

Comments
 (0)