AioScrapy

AioScrapy是一个基于Python异步IO的强大网络爬虫框架。它的设计理念源自Scrapy，但完全基于异步IO实现，提供更高的性能和更灵活的配置选项。
AioScrapy is a powerful asynchronous web crawling framework built on Python's asyncio library. It is inspired by Scrapy but completely reimplemented with asynchronous IO, offering higher performance and more flexible configuration options.

特性 | Features

完全异步：基于Python的asyncio库，实现高效的并发爬取
多种下载处理程序：支持多种HTTP客户端，包括aiohttp、httpx、requests、pyhttpx、curl_cffi、DrissionPage和playwright
灵活的中间件系统：轻松添加自定义功能和处理逻辑
强大的数据处理管道：支持多种数据库存储选项
内置信号系统：方便的事件处理机制
丰富的配置选项：高度可定制的爬虫行为
分布式爬取：支持使用Redis和RabbitMQ进行分布式爬取
数据库集成：内置支持Redis、MySQL、MongoDB、PostgreSQL和RabbitMQ
Fully Asynchronous: Built on Python's asyncio for efficient concurrent crawling
Multiple Download Handlers: Support for various HTTP clients including aiohttp, httpx, requests, pyhttpx, curl_cffi, DrissionPage and playwright
Flexible Middleware System: Easily add custom functionality and processing logic
Powerful Data Processing Pipelines: Support for various database storage options
Built-in Signal System: Convenient event handling mechanism
Rich Configuration Options: Highly customizable crawler behavior
Distributed Crawling: Support for distributed crawling using Redis and RabbitMQ
Database Integration: Built-in support for Redis, MySQL, MongoDB, PostgreSQL, and RabbitMQ

安装 | Installation

要求 | Requirements

Python 3.9+

使用pip安装 | Install with pip

pip install aio-scrapy

# Install the latest aio-scrapy
# pip install git+https://github.com/ConlinH/aio-scrapy

开始 | Start

from aioscrapy import Spider, logger


class MyspiderSpider(Spider):
    name = 'myspider'
    custom_settings = {
        "CLOSE_SPIDER_ON_IDLE": True
    }
    start_urls = ["https://quotes.toscrape.com"]

    @staticmethod
    async def process_request(request, spider):
        """ request middleware """
        pass

    @staticmethod
    async def process_response(request, response, spider):
        """ response middleware """
        return response

    @staticmethod
    async def process_exception(request, exception, spider):
        """ exception middleware """
        pass

    async def parse(self, response):
        for quote in response.css('div.quote'):
            item = {
                'author': quote.xpath('span/small/text()').get(),
                'text': quote.css('span.text::text').get(),
            }
            yield item

    async def process_item(self, item):
        logger.info(item)


if __name__ == '__main__':
    MyspiderSpider.start()

文档 | Documentation

文档目录 | Documentation Contents

许可证 | License

本项目采用MIT许可证 - 详情请查看LICENSE文件。
This project is licensed under the MIT License - see the LICENSE file for details.

联系

QQ: 995018884
WeChat: h995018884

Name		Name	Last commit message	Last commit date
Latest commit History 233 Commits
aioscrapy		aioscrapy
docs		docs
example		example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AioScrapy

特性 | Features

安装 | Installation

要求 | Requirements

使用pip安装 | Install with pip

开始 | Start

文档 | Documentation

文档目录 | Documentation Contents

许可证 | License

联系

About

Uh oh!

Releases 12

Packages

Uh oh!

Languages

License

ConlinH/aio-scrapy

Folders and files

Latest commit

History

Repository files navigation

AioScrapy

特性 | Features

安装 | Installation

要求 | Requirements

使用pip安装 | Install with pip

开始 | Start

文档 | Documentation

文档目录 | Documentation Contents

许可证 | License

联系

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Languages

Packages