Skip to content

【疑似BUG】0.1.2 版本pretend.py 文件存在问题,导致采集失败 #29

@DeSireFire

Description

@DeSireFire

部署新服务器的时候出现了问题。经过对比定位到了原因。
GerapyPyppeteer/gerapy_pyppeteer/pretend.py
使用 0.0.13版本正常代码如下
SET_WEBDRIVER = '''() => {Object.defineProperty(navigator, 'webdriver', {get: () => undefined})}'''
使用 0.1.2
其中第73行的SET_WEBDRIVER变量存在问题.请求某数时,被检测返回400.

测试代码:

import json
import os
import asyncio
import time

from pyppeteer import launch, connection
from pyppeteer import chromium_downloader
from gerapy_pyppeteer.pretend import SCRIPTS as PRETEND_SCRIPTS
from pyppeteer.network_manager import Response



async def main():
    browser = await launch({'headless': False, 'timeout': 10000, 'args': ['--no-sandbox', ]},)
    page = await browser.newPage()
    for script in PRETEND_SCRIPTS:
        await page.evaluateOnNewDocument(script)

    print(len(await browser.pages()))
    await page.goto(http://www.某个网址.com.cn/old_house/old_house.html') # 记得修改

    await page.waitForNavigation()


    await page.waitFor(10 * 1000)

    print(await page.evaluate("document.cookie"))
    print(f'等待url 完成')

    # await page.waitFor(10 * 1000)
    print(await page.content())

    await browser.close()



asyncio.get_event_loop().run_until_complete(main())

会拿到一个空白页

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions