This project demonstrates the use of Selenium and other Python libraries for web scraping and data analysis. The primary focus is on fetching financial data from various online sources like CBOE, WSJ, CNBC, and others, using tools like Selenium for browser automation and BeautifulSoup for HTML parsing.
The project dependencies are listed in requirements.txt. Key libraries include:
- Selenium for browser automation
- BeautifulSoup4 for parsing HTML content
- Pandas for data manipulation
- Requests for HTTP requests
- yfinance and alpha_vantage for financial data APIs
- Clone this repository or download the files.
- Ensure you have Python installed (version 3.x recommended).
- Install the required packages using pip:
pip install -r requirements.txt - For Selenium, you will need a compatible web driver (e.g., ChromeDriver for Google Chrome). Download it from ChromeDriver downloads and ensure it's in your system PATH or specify the path in the script.
The main script is provided in the Jupyter Notebook selenium example.ipynb. To run it:
- Open the notebook in Jupyter Notebook or JupyterLab.
- Ensure all dependencies are installed as per the requirements.txt.
- Run the cells sequentially to see the data scraping and processing in action. Some cells interact with web pages using Selenium to fetch dynamic content.
This project scrapes data from:
- CBOE for VIX data
- WSJ for market data
- CNBC for index quotes
- Other financial data sources via APIs like yfinance and alpha_vantage
- Some API keys might be required for full functionality (e.g., Alpha Vantage API key is hardcoded in the notebook for demonstration). Replace with your own key if needed.
- Web scraping scripts are sensitive to changes in website structure. If errors occur, the XPaths or selectors might need updating.
- Ensure compliance with the terms of service of the websites being scraped.
This project is for educational purposes. Ensure you have the right to scrape and use data from the mentioned sources.