Skip to content
Change the repository type filter

All

    Repositories list

    • Software stack with latest Scrapy and updated deps
      Dockerfile
      BSD 3-Clause "New" or "Revised" License
      206322Updated Jun 20, 2025Jun 20, 2025
    • Python
      BSD 3-Clause "New" or "Revised" License
      151522Updated Jun 13, 2025Jun 13, 2025
    • More flexible and featured Frontera scheduler for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      53721Updated Jun 6, 2025Jun 6, 2025
    • Crawl Frontier HCF backend
      Python
      BSD 3-Clause "New" or "Revised" License
      5821Updated Jun 6, 2025Jun 6, 2025
    • frontera

      Public
      A scalable frontier for web crawlers
      Python
      BSD 3-Clause "New" or "Revised" License
      2171.3k7817Updated Jun 6, 2025Jun 6, 2025
    • web-poet

      Public
      Web scraping Page Objects core library
      Python
      BSD 3-Clause "New" or "Revised" License
      151011513Updated Jun 6, 2025Jun 6, 2025
    • Scrapy entrypoint for Scrapinghub job runner
      Python
      BSD 3-Clause "New" or "Revised" License
      162671Updated Jun 5, 2025Jun 5, 2025
    • python parser for human readable dates
      Python
      BSD 3-Clause "New" or "Revised" License
      4802.7k29853Updated May 29, 2025May 29, 2025
    • Page Object pattern for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      28123114Updated May 26, 2025May 26, 2025
    • andi

      Public
      Library for annotation-based dependency injection
      Python
      BSD 3-Clause "New" or "Revised" License
      62241Updated May 14, 2025May 14, 2025
    • shub

      Public
      Scrapinghub Command Line Client
      Python
      BSD 3-Clause "New" or "Revised" License
      811334714Updated Apr 21, 2025Apr 21, 2025
    • spidermon

      Public
      Scrapy Extension for monitoring spiders execution.
      Python
      BSD 3-Clause "New" or "Revised" License
      101542423Updated Apr 11, 2025Apr 11, 2025
    • extruct

      Public
      Extract embedded metadata from HTML markup
      Python
      BSD 3-Clause "New" or "Revised" License
      1189193914Updated Mar 24, 2025Mar 24, 2025
    • A client interface for Scrapinghub's API
      Python
      BSD 3-Clause "New" or "Revised" License
      61208254Updated Feb 21, 2025Feb 21, 2025
    • Extract price amount and currency symbol from a raw text string
      Python
      BSD 3-Clause "New" or "Revised" License
      50330179Updated Feb 13, 2025Feb 13, 2025
    • Python Social Auth - Application - Django
      Python
      BSD 3-Clause "New" or "Revised" License
      385201Updated Nov 18, 2024Nov 18, 2024
    • Formasaurus tells you the type of an HTML form and its fields using machine learning
      HTML
      48710Updated Nov 7, 2024Nov 7, 2024
    • Parse numbers written in natural language
      Python
      BSD 3-Clause "New" or "Revised" License
      25117136Updated Oct 23, 2024Oct 23, 2024
    • A python binding for crfsuite
      Python
      MIT License
      222774453Updated Oct 1, 2024Oct 1, 2024
    • streamparse lets you run Python code against real-time streams of data. Integrates with Apache Storm.
      Python
      Apache License 2.0
      218201Updated Sep 20, 2024Sep 20, 2024
    • splash

      Public
      Lightweight, scriptable browser as a service with an HTTP API
      Python
      BSD 3-Clause "New" or "Revised" License
      5164.2k37426Updated Aug 2, 2024Aug 2, 2024
    • A Postgres-backed ContentsManager implementation for IPython
      Python
      Apache License 2.0
      86201Updated Jul 18, 2024Jul 18, 2024
    • shublang

      Public
      Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
      Python
      BSD 3-Clause "New" or "Revised" License
      816236Updated Jul 9, 2024Jul 9, 2024
    • An opinionated fork of the Drone CI system
      Go
      Other
      430005Updated Jul 7, 2024Jul 7, 2024
    • varanus

      Public
      A command line spider monitoring tool
      Python
      7822Updated Jul 6, 2024Jul 6, 2024
    • scrapyrt

      Public
      HTTP API for Scrapy spiders
      Python
      BSD 3-Clause "New" or "Revised" License
      161860246Updated Jun 28, 2024Jun 28, 2024
    • portia

      Public
      Visual scraping for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      1.4k9.4k11119Updated Jun 26, 2024Jun 26, 2024
    • scikit-learn inspired API for CRFsuite
      Python
      214200Updated Jun 18, 2024Jun 18, 2024
    • Python
      MIT License
      2403Updated Jun 17, 2024Jun 17, 2024
    • autologin

      Public
      A project to attempt to automatically login to a website given a single seed
      Python
      Apache License 2.0
      431102Updated Jun 17, 2024Jun 17, 2024