Skip to content

Commit 5013ee0

Browse files
committed
docs(example): replace image with gif
1 parent fd07c55 commit 5013ee0

File tree

7 files changed

+40
-48
lines changed

7 files changed

+40
-48
lines changed

README.md

Lines changed: 23 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -135,63 +135,59 @@ npm install x-crawl
135135
Take the automatic acquisition of some photos of experiences and homes around the world every day as an example:
136136

137137
```js
138-
// 1.Import module ES/CJS
138+
// 1. Import module ES/CJS
139139
import xCrawl from 'x-crawl'
140140

141-
// 2.Create a crawler instance
142-
const myXCrawl = xCrawl({ maxRetry: 3, intervalTime: { max: 3000, min: 2000 } })
141+
// 2. Create a crawler instance
142+
const myXCrawl = xCrawl({ maxRetry: 3, intervalTime: { max: 2000, min: 1000 } })
143143

144-
// 3.Set the crawling task
144+
// 3. Set the crawling task
145145
/*
146146
Call the startPolling API to start the polling function,
147147
and the callback function will be called every other day
148148
*/
149149
myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
150-
// Call crawlPage API to crawl Page
151-
const res = await myXCrawl.crawlPage({
150+
// Call the crawlPage API to crawl the page
151+
const pageResults = await myXCrawl.crawlPage({
152152
targets: [
153-
'https://www.airbnb.cn/s/experiences',
153+
'https://www.airbnb.cn/s/*/experiences',
154154
'https://www.airbnb.cn/s/plus_homes'
155155
],
156156
viewport: { width: 1920, height: 1080 }
157157
})
158158

159-
// Store the image URL to targets
160-
const targets = []
161-
const elSelectorMap = ['._fig15y', '._aov0j6']
162-
for (const item of res) {
159+
// Obtain the image URL by traversing the crawled page results
160+
const imgUrls = []
161+
for (const item of pageResults) {
163162
const { id } = item
164163
const { page } = item.data
164+
const elSelector = id === 1 ? '.i9cqrtb' : '.c4mnd7m'
165165

166-
// Wait for the page to load
167-
await new Promise((r) => setTimeout(r, 300))
166+
// wait for the page element to appear
167+
await page.waitForSelector(elSelector)
168168

169-
// Gets the URL of the page image
170-
const urls = await page.$$eval(`${elSelectorMap[id - 1]} img`, (imgEls) => {
171-
return imgEls.map((item) => item.src)
172-
})
173-
targets.push(...urls)
169+
// Get the URL of the page image
170+
const urls = await page.$$eval(`${elSelector} picture img`, (imgEls) =>
171+
imgEls.map((item) => item.src)
172+
)
173+
imgUrls.push(...urls.slice(0, 8))
174174

175-
// Close page
175+
// close the page
176176
page.close()
177177
}
178178

179-
// Call the crawlFile API to crawl pictures
180-
myXCrawl.crawlFile({ targets, storeDirs: './upload' })
179+
// Call crawlFile API to crawl pictures
180+
await myXCrawl.crawlFile({ targets: imgUrls, storeDirs: './upload' })
181181
})
182182
```
183183

184184
running result:
185185

186186
<div align="center">
187-
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/en/crawler.png" />
188-
</div>
189-
190-
<div align="center">
191-
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/en/crawler-result.png" />
187+
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/example.gif" />
192188
</div>
193189

194-
**Note:** Do not crawl at will, you can check the **robots.txt** protocol before crawling. This is just to demonstrate how to use x-crawl.
190+
**Note:** Please do not crawl randomly, you can check the **robots.txt** protocol before crawling. The class name of the website may change, this is just to demonstrate how to use x-crawl.
195191

196192
## Core Concepts
197193

assets/cn/crawler-result.png

-389 KB
Binary file not shown.

assets/cn/crawler.png

-21.1 KB
Binary file not shown.

assets/en/crawler-result.png

-389 KB
Binary file not shown.

assets/en/crawler.png

-21.1 KB
Binary file not shown.

assets/example.gif

867 KB
Loading

docs/cn.md

Lines changed: 17 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ x-crawl 是采用 MIT 许可的开源项目,使用完全免费。如果你在
117117
- [API Other](#API-Other)
118118
- [AnyObject](#AnyObject)
119119
- [常见问题](#常见问题)
120-
- [crawlPage 跟 puppeteer 的关系](#crawlPage-跟-puppeteer-的关系)
120+
- [crawlPage API 跟 puppeteer 的关系](#crawlPage-API-跟-puppeteer-的关系)
121121
- [更多](#更多)
122122
- [社区](#社区)
123123
- [Issues](#Issues)
@@ -140,56 +140,52 @@ npm install x-crawl
140140
import xCrawl from 'x-crawl'
141141

142142
// 2.创建一个爬虫实例
143-
const myXCrawl = xCrawl({ maxRetry: 3, intervalTime: { max: 3000, min: 2000 } })
143+
const myXCrawl = xCrawl({ maxRetry: 3, intervalTime: { max: 2000, min: 1000 } })
144144

145145
// 3.设置爬取任务
146146
// 调用 startPolling API 开始轮询功能,每隔一天会调用回调函数
147147
myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
148148
// 调用 crawlPage API 来爬取页面
149-
const res = await myXCrawl.crawlPage({
149+
const pageResults = await myXCrawl.crawlPage({
150150
targets: [
151-
'https://www.airbnb.cn/s/experiences',
151+
'https://www.airbnb.cn/s/*/experiences',
152152
'https://www.airbnb.cn/s/plus_homes'
153153
],
154154
viewport: { width: 1920, height: 1080 }
155155
})
156156

157-
// 存放图片 URL 到 targets
158-
const targets = []
159-
const elSelectorMap = ['._fig15y', '._aov0j6']
160-
for (const item of res) {
157+
// 通过遍历爬取页面结果获取图片 URL
158+
const imgUrls = []
159+
for (const item of pageResults) {
161160
const { id } = item
162161
const { page } = item.data
162+
const elSelector = id === 1 ? '.i9cqrtb' : '.c4mnd7m'
163163

164-
// 等待页面加载完成
165-
await new Promise((r) => setTimeout(r, 300))
164+
// 等待页面元素出现
165+
await page.waitForSelector(elSelector)
166166

167167
// 获取页面图片的 URL
168-
const urls = await page.$$eval(`${elSelectorMap[id - 1]} img`, (imgEls) => {
169-
return imgEls.map((item) => item.src)
170-
})
171-
targets.push(...urls)
168+
const urls = await page.$$eval(`${elSelector} picture img`, (imgEls) =>
169+
imgEls.map((item) => item.src)
170+
)
171+
imgUrls.push(...urls.slice(0, 8))
172172

173173
// 关闭页面
174174
page.close()
175175
}
176176

177177
// 调用 crawlFile API 爬取图片
178-
await myXCrawl.crawlFile({ targets, storeDirs: './upload' })
178+
await myXCrawl.crawlFile({ targets: imgUrls, storeDirs: './upload' })
179179
})
180180
```
181181

182182
运行效果:
183183

184184
<div align="center">
185-
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/cn/crawler.png" />
186-
</div>
187-
188-
<div align="center">
189-
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/cn/crawler-result.png" />
185+
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/example.gif" />
190186
</div>
191187

192-
**注意:** 请勿随意爬取,爬取前可查看 **robots.txt** 协议。这里只是为了演示如何使用 x-crawl 。
188+
**注意:** 请勿随意爬取,爬取前可查看 **robots.txt** 协议。网站的类名可能会有变更,这里只是为了演示如何使用 x-crawl 。
193189

194190
## 核心概念
195191

0 commit comments

Comments
 (0)