The BankMarketCapETL project is designed to create a robust ETL (Extract, Transform, Load) pipeline focusing on the market capitalization of the world's largest banks. This pipeline extracts data from a specified Wikipedia page, transforms the market capitalization values according to exchange rates, and loads the data into both a SQLite database and a CSV file for further analysis or visualization. This project aims to provide financial analysts and enthusiasts with up-to-date information on bank valuations in various currencies, facilitating global financial comparisons and analyses.
- Data Source URL: List of largest banks on Wikipedia
- Exchange Rate CSV Path: Path to a CSV file containing the latest exchange rates.
- SQLite Database: Banks.dbfor storing transformed data.
- Output CSV File: Largest_banks_data.csvfor easy access and sharing of the transformed data.
- Objective: Extract the list of the world's largest banks and their market capitalization in USD from the specified Wikipedia page.
- Method: Use BeautifulSoupto parse the HTML content of the page and extract relevant data into a pandas DataFrame.
- Objective: Convert the market capitalization values from USD to GBP, EUR, and INR using the exchange rates provided in the exchange_rate.csvfile.
- Method: Read the exchange rates from the CSV file into a dictionary and apply these rates to the MC_USD_Billioncolumn in the DataFrame. The transformed data will include new columns:MC_GBP_Billion,MC_EUR_Billion, andMC_INR_Billion, with values rounded to the nearest billion.
- Objective: Save the transformed DataFrame to a CSV file for easy access and distribution.
- Method: Use the to_csvmethod of pandas DataFrame to write the data toLargest_banks_data.csv, ensuring the data is accessible outside the Python environment.
- Objective: Load the transformed data into a SQLite database for persistent storage and query capabilities.
- Method: Utilize sqlite3and pandas'to_sqlfunction to insert the DataFrame into theLargest_bankstable in theBanks.dbdatabase.
- Objective: Demonstrate the ability to run queries against the loaded data in the SQLite database.
- Method: Implement a run_queriesfunction to execute SQL queries, printing both the query and its results. Sample queries include selecting the entire table, calculating the average market capitalization, and listing the top 5 banks by market capitalization.
The BankMarketCapETL project streamlines the process of gathering, converting, and storing critical financial data regarding the world's largest banks. By automating the extraction of up-to-date market capitalization data and accommodating currency conversions, this pipeline serves as a valuable tool for financial analysis and reporting.
