Skip to content

Commit 62aa358

Browse files
committed
docs: parameter name update
1 parent 48ab006 commit 62aa358

File tree

3 files changed

+29
-33
lines changed

3 files changed

+29
-33
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -331,7 +331,7 @@ import xCrawl from 'x-crawl'
331331
const myXCrawl = xCrawl({
332332
maxRetry: 3,
333333
// Cancel running the browser in headless mode
334-
crawlPage: { launchBrowser: { headless: false } }
334+
crawlPage: { puppeteerLaunch: { headless: false } }
335335
})
336336

337337
myXCrawl.crawlPage('https://www.example.com').then((res) => {})
@@ -1298,7 +1298,7 @@ export interface XCrawlConfig extends CrawlCommonConfig {
12981298
baseUrl?: string
12991299
intervalTime?: IntervalTime
13001300
crawlPage?: {
1301-
launchBrowser?: PuppeteerLaunchOptions // puppeteer
1301+
puppeteerLaunch?: PuppeteerLaunchOptions // puppeteer
13021302
}
13031303
}
13041304
```

docs/cn.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -329,7 +329,7 @@ import xCrawl from 'x-crawl'
329329
const myXCrawl = xCrawl({
330330
maxRetry: 3,
331331
// 取消以无头模式运行浏览器
332-
crawlPage: { launchBrowser: { headless: false } }
332+
crawlPage: { puppeteerLaunch: { headless: false } }
333333
})
334334

335335
myXCrawl.crawlPage('https://www.example.com').then((res) => {})
@@ -1292,7 +1292,7 @@ export interface XCrawlConfig extends CrawlCommonConfig {
12921292
baseUrl?: string
12931293
intervalTime?: IntervalTime
12941294
crawlPage?: {
1295-
launchBrowser?: PuppeteerLaunchOptions // puppeteer
1295+
puppeteerLaunch?: PuppeteerLaunchOptions // puppeteer
12961296
}
12971297
}
12981298
```

publish/README.md

Lines changed: 25 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -135,63 +135,59 @@ npm install x-crawl
135135
Take the automatic acquisition of some photos of experiences and homes around the world every day as an example:
136136

137137
```js
138-
// 1.Import module ES/CJS
138+
// 1. Import module ES/CJS
139139
import xCrawl from 'x-crawl'
140140

141-
// 2.Create a crawler instance
142-
const myXCrawl = xCrawl({ maxRetry: 3, intervalTime: { max: 3000, min: 2000 } })
141+
// 2. Create a crawler instance
142+
const myXCrawl = xCrawl({ maxRetry: 3, intervalTime: { max: 2000, min: 1000 } })
143143

144-
// 3.Set the crawling task
144+
// 3. Set the crawling task
145145
/*
146146
Call the startPolling API to start the polling function,
147147
and the callback function will be called every other day
148148
*/
149149
myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
150-
// Call crawlPage API to crawl Page
151-
const res = await myXCrawl.crawlPage({
150+
// Call the crawlPage API to crawl the page
151+
const pageResults = await myXCrawl.crawlPage({
152152
targets: [
153-
'https://www.airbnb.cn/s/experiences',
153+
'https://www.airbnb.cn/s/*/experiences',
154154
'https://www.airbnb.cn/s/plus_homes'
155155
],
156156
viewport: { width: 1920, height: 1080 }
157157
})
158158

159-
// Store the image URL to targets
160-
const targets = []
161-
const elSelectorMap = ['._fig15y', '._aov0j6']
162-
for (const item of res) {
159+
// Obtain the image URL by traversing the crawled page results
160+
const imgUrls = []
161+
for (const item of pageResults) {
163162
const { id } = item
164163
const { page } = item.data
164+
const elSelector = id === 1 ? '.i9cqrtb' : '.c4mnd7m'
165165

166-
// Wait for the page to load
167-
await new Promise((r) => setTimeout(r, 300))
166+
// wait for the page element to appear
167+
await page.waitForSelector(elSelector)
168168

169-
// Gets the URL of the page image
170-
const urls = await page.$$eval(`${elSelectorMap[id - 1]} img`, (imgEls) => {
171-
return imgEls.map((item) => item.src)
172-
})
173-
targets.push(...urls)
169+
// Get the URL of the page image
170+
const urls = await page.$$eval(`${elSelector} picture img`, (imgEls) =>
171+
imgEls.map((item) => item.src)
172+
)
173+
imgUrls.push(...urls.slice(0, 8))
174174

175-
// Close page
175+
// close the page
176176
page.close()
177177
}
178178

179-
// Call the crawlFile API to crawl pictures
180-
myXCrawl.crawlFile({ targets, storeDirs: './upload' })
179+
// Call crawlFile API to crawl pictures
180+
await myXCrawl.crawlFile({ targets: imgUrls, storeDirs: './upload' })
181181
})
182182
```
183183

184184
running result:
185185

186186
<div align="center">
187-
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/en/crawler.png" />
188-
</div>
189-
190-
<div align="center">
191-
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/en/crawler-result.png" />
187+
<img src="https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/example.gif" />
192188
</div>
193189

194-
**Note:** Do not crawl at will, you can check the **robots.txt** protocol before crawling. This is just to demonstrate how to use x-crawl.
190+
**Note:** Please do not crawl randomly, you can check the **robots.txt** protocol before crawling. The class name of the website may change, this is just to demonstrate how to use x-crawl.
195191

196192
## Core Concepts
197193

@@ -335,7 +331,7 @@ import xCrawl from 'x-crawl'
335331
const myXCrawl = xCrawl({
336332
maxRetry: 3,
337333
// Cancel running the browser in headless mode
338-
crawlPage: { launchBrowser: { headless: false } }
334+
crawlPage: { puppeteerLaunch: { headless: false } }
339335
})
340336

341337
myXCrawl.crawlPage('https://www.example.com').then((res) => {})
@@ -1302,7 +1298,7 @@ export interface XCrawlConfig extends CrawlCommonConfig {
13021298
baseUrl?: string
13031299
intervalTime?: IntervalTime
13041300
crawlPage?: {
1305-
launchBrowser?: PuppeteerLaunchOptions // puppeteer
1301+
puppeteerLaunch?: PuppeteerLaunchOptions // puppeteer
13061302
}
13071303
}
13081304
```

0 commit comments

Comments
 (0)