Skip to content

gatuno1/scrape-master

 
 

Repository files navigation

ScrapeMaster

ScrapeMaster is a Streamlit-based web scraping application designed to simplify the process of extracting data from web pages. It allows users to specify URLs and data fields interactively, facilitating the extraction and manipulation of web data.

Features

  • Easy-to-use web interface.
  • Custom field specification for data extraction.
  • Pagination
  • Dynamic data processing with Python and Streamlit.
  • Direct download capabilities for extracted data in various formats.
  • Attended mode

Prerequisites

Before you begin, ensure you have the following installed:

  • Python 3.6 or higher
  • Pip for managing Python packages

Installation

Follow these steps to get your development environment running:

# Clone the repository
git clone https://github.com/reda-marzouk608/scrape-master
cd scrape-master

# It's recommended to create a virtual environment
python -m venv venv
# Activate the virtual environment
# On Windows
venv\Scripts\activate
# On MacOS/Linux
source venv/bin/activate

# Install the required packages
pip install -r requirements.txt

Streamlit opt-out configuration (optative)

Setting global config file:

  • Linux/macOS:

    # create directory if it doesn't exists
    mkdir -p ~/.streamlit/
    # Setting global config file
    cp ./.streamlit/config.toml ~/.streamlit/config.toml
  • Windows:

    # create directory if it doesn't exists
    $targetDir = "$env:USERPROFILE\.streamlit\"
    if (-not (Test-Path -Path $targetDir)) {
       New-Item -ItemType Directory -Path $targetDir
    }
    # Setting global config file
    Copy-Item -Path ".\.streamlit\config.toml" -Destination $targetDir

Launching the Application

To run ScrapeMaster, navigate to the project directory and run the following command:

streamlit run streamlit_app.py

Usage

After launching the application, open your web browser to the indicated address (typically http://localhost:8501). Use the sidebar to input the URL and fields you wish to scrape, then click the "Scrape" button to see results.

References

About

Scrape web sites with IA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.1%
  • Dockerfile 2.9%