Docs: How to open a browser

coder-hxl · coder-hxl · commit b815fe38ffe7 · 2023-04-27T15:27:57.000+08:00
diff --git a/README.md b/README.md
@@ -40,6 +40,7 @@ The crawlPage API has built-in [puppeteer](https://github.com/puppeteer/puppetee
     - [Page Instance](#Page-Instance)
     - [life Cycle](#life-Cycle)
       - [onCrawlItemComplete](#onCrawlItemComplete)
+    - [Open Browser](#Open-Browser)
   - [Crawl Interface](#Crawl-Interface)
     - [life Cycle](#life-Cycle-1)
       - [onCrawlItemComplete](#onCrawlItemComplete-1)
@@ -163,7 +164,7 @@ myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
     await new Promise((r) => setTimeout(r, 300))
 
     // Gets the URL of the page image
-    const urls = await page!.$$eval(
+    const urls = await page.$$eval(
       `${elSelectorMap[id - 1]} img`,
       (imgEls) => {
         return imgEls.map((item) => item.src)
@@ -282,13 +283,13 @@ myXCrawl.crawlPage('https://www.example.com').then((res) => {
 
 #### Browser Instance
 
-When you call crawlPage API to crawl pages in the same crawler instance, the browser instance used is the same, because the crawlPage API of the browser instance in the same crawler instance is shared. It's a headless browser, no UI shell, what he does is bring **all modern web platform features** provided by the browser rendering engine to the code. For specific usage, please refer to [Browser](https://pptr.dev/api/puppeteer.browser).
+When you call crawlPage API to crawl pages in the same crawler instance, the browser instance used is the same, because the crawlPage API of the browser instance in the same crawler instance is shared.  For specific usage, please refer to [Browser](https://pptr.dev/api/puppeteer.browser).
 
 **Note:** The browser will keep running and the file will not be terminated. If you want to stop, you can execute browser.close() to close it. Do not call [crawlPage](#crawlPage) or [page](#page) if you need to use it later. Because the crawlPage API of the browser instance in the same crawler instance is shared.
 
 #### Page Instance
 
-When you call crawlPage API to crawl pages in the same crawler instance, a new page instance will be generated from the browser instance. It can be used for interactive operations. For specific usage, please refer to [Page](https://pptr.dev/api/puppeteer.page).
+When you call crawlPage API to crawl pages in the same crawler instance, a new page instance will be generated from the browser instance. For specific usage, please refer to [Page](https://pptr.dev/api/puppeteer.page).
 
 The browser instance will retain a reference to the page instance. If it is no longer used in the future, the page instance needs to be closed by itself, otherwise it will cause a memory leak.
 
@@ -323,6 +324,22 @@ In the onCrawlItemComplete function, you can get the results of each crawled goa
 
 **Note:** If you need to crawl many pages at one time, you need to use this life cycle function to process the results of each target and close the page instance after each page is crawled down. If you do not close the page instance, then The program will crash due to too many opened pages.
 
+#### Open Browser
+
+Disable running the browser in headless mode.
+
+```js
+import xCrawl from 'x-crawl'
+
+const myXCrawl = xCrawl({
+   maxRetry: 3,
+   // Cancel running the browser in headless mode
+   crawlPage: { launchBrowser: { headless: false } }
+})
+
+myXCrawl.crawlPage('https://www.example.com').then((res) => {})
+```
+
 ### Crawl Interface
 
 Crawl interface data through [crawlData()](#crawlData) .
diff --git a/docs/cn.md b/docs/cn.md
@@ -40,6 +40,7 @@ crawlPage API 内置了 [puppeteer](https://github.com/puppeteer/puppeteer) ，
     - [page 实例](#page-实例)
     - [生命周期](#生命周期)
       - [onCrawlItemComplete](#onCrawlItemComplete)
+    - [打开浏览器](#打开浏览器)
   - [爬取接口](#爬取接口)
     - [生命周期](#生命周期-1)
       - [onCrawlItemComplete](#onCrawlItemComplete-1)
@@ -161,7 +162,7 @@ myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
     await new Promise((r) => setTimeout(r, 300))
 
     // 获取页面图片的 URL
-    const urls = await page!.$$eval(
+    const urls = await page.$$eval(
       `${elSelectorMap[id - 1]} img`,
       (imgEls) => {
         return imgEls.map((item) => item.src)
@@ -281,13 +282,13 @@ myXCrawl.crawlPage('https://www.example.com').then((res) => {
 
 #### browser 实例
 
-当你在同个爬虫实例调用 crawlPage API 进行爬取页面时，所用的 browser 实例都是同一个，因为 browser 实例在同个爬虫实例中的 crawlPage API 是共享的。他是个无头浏览器，并无 UI 外壳，他做的是将浏览器渲染引擎提供的**所有现代网络平台功能**带到代码中。具体使用可以参考 [Browser](https://pptr.dev/api/puppeteer.browser) 。
+当你在同个爬虫实例调用 crawlPage API 进行爬取页面时，所用的 browser 实例都是同一个，因为 browser 实例在同个爬虫实例中的 crawlPage API 是共享的。具体使用可以参考 [Browser](https://pptr.dev/api/puppeteer.browser) 。
 
 **注意：** browser 会一直保持着运行，造成文件不会终止，如果想停止可以执行 browser.close() 关闭。如果后面还需要用到 [crawlPage](#crawlPage) 或者 [page](#page) 请勿调用。因为 browser 实例在同个爬虫实例中的 crawlPage API 是共享的。
 
 #### page 实例
 
-当你在同个爬虫实例调用 crawlPage API 进行爬取页面时，都会从 browser 实例中产生一个新的 page 实例。其可以做交互操作，具体使用可以参考 [Page](https://pptr.dev/api/puppeteer.page) 。
+当你在同个爬虫实例调用 crawlPage API 进行爬取页面时，都会从 browser 实例中产生一个新的 page 实例。具体使用可以参考 [Page](https://pptr.dev/api/puppeteer.page) 。
 
 browser 实例内部会保留着对 page 实例的引用，如果后续不再使用需要自行关闭 page 实例，否则会造成内存泄露。
 
@@ -322,6 +323,22 @@ crawlPage API 拥有的声明周期函数:
 
 **注意:** 如果你需要一次性爬取很多页面，就需要在每个页面爬下来后，用这个生命周期函数来处理每个目标的结果并关闭 page 实例，如果不进行关闭操作，则会因开启的 page 过多而造成程序崩溃。
 
+#### 打开浏览器
+
+取消以无头模式运行浏览器。
+
+```js
+import xCrawl from 'x-crawl'
+
+const myXCrawl = xCrawl({
+  maxRetry: 3,
+  // 取消以无头模式运行浏览器
+  crawlPage: { launchBrowser: { headless: false } }
+})
+
+myXCrawl.crawlPage('https://www.example.com').then((res) => {})
+```
+
 ### 爬取接口
 
 通过 [crawlData()](#crawlData) 爬取接口数据。