This project is a Python-based web scraper designed to collect detailed football match events data from websites like WhoScored. The scraper automates the process of extracting key football match statistics and stores the data into a Supabase PostgreSQL database for further analysis and visualization.
- Automated Scraping: Scrapes football match events, including xG, possession, player, and team statistics.
- Data Handling: Utilizes BeautifulSoup for HTML parsing and Selenium for navigating dynamic content on the web.
- Supabase Integration: Data is automatically stored in a PostgreSQL database hosted on Supabase, making it easily accessible for further analysis.
- Efficient Data Storage: Uses Pandas and Pydantic models to structure, validate, and store match events.
- Python: Core language for writing the scraping scripts.
- BeautifulSoup: Used for parsing HTML content from the websites.
- Selenium: Automates the browser to interact with dynamic web pages and handle JavaScript content.
- Pandas: For data structuring and manipulation.
- Pydantic: Used to ensure data models are validated before storage.
- Supabase: Cloud-based PostgreSQL database for data storage and retrieval.
- Clone the repository:
git clone <repository-url>
- Install the required Python libraries:
pip install -r requirements.txt
- Ensure you have a valid
.env
file for your Supabase credentials:project_url=<your-supabase-url> project_api=<your-supabase-api-key> supabase_password=<your-supabase-password>
- Run the script to start scraping match events:
python scraper.py
Feel free to contribute or raise issues for improvements.