Automated workflows for end-to-end financial data analysis using Python.
This repository provides a collection of automated processes designed to streamline financial data analysis.
It is structured into modular phases, each contained in its own folder.
The goal is to create a standardized, repeatable, and efficient pipeline that transforms raw financial data into actionable insights — covering everything from data cleaning to visualization.
Financial-Data-Analysis-Automation-with-Python/
│
├── 1) Data Cleaning/ → Collecting (from Yahoo Finance), cleaning and preparing raw data
├── 2) Exploratory Data Analysis (EDA)/ → Automated data exploration and profiling
├── 3) Feature Engineering/ → Feature creation and transformation
├── 4) Data Visualization/ → Graphs, plots, and visual reports
└── README.md → Main repository documentation
- Removes or imputes missing values.
- Converts data types and formats (e.g. dates, currencies).
- Detects and removes duplicates and outliers.
- Produces a clean dataset ready for EDA and modeling.
Output: cleaned_data.csv
(or equivalent)
- Automatically generates summary statistics (mean, median, variance, etc.).
- Creates visual profiles — histograms, boxplots, correlations, and heatmaps.
- Identifies patterns, anomalies, and data relationships.
Output: Automated HTML or image reports summarizing dataset characteristics.
- Builds new derived features (e.g. moving averages, ratios, lags).
- Performs feature selection and transformations (normalization, encoding).
- Creates datasets ready for machine learning or statistical modeling.
Output: features_dataset.csv
or model-ready DataFrame.
- Produces static plots and dashboards to visualize trends and results.
- Focuses on financial insights: time series, volatility, returns, risk indicators.
- Enables data-driven storytelling through visuals.
Output: .png
, .html
, or .ipynb
visual summaries.
- The main dataset is automatically downloaded in the
src
folder using Yahoo Finance. - You can also place any custom dataset you want to analyze inside the
data_raw
folder — the cleaning scripts will handle it automatically. - Run scripts sequentially in each folder:
1_Data_Cleaning
→ clean and standardize data.2_Exploratory_Data_Analysis_(EDA)
→ explore and summarize.3_Feature_Engineering
→ create new features.4_Data_Visualization
→ visualize results.
- Review generated files and outputs.
- Customize scripts for your dataset or add new modules.
- Python 3.x
- Main libraries:
pandas numpy matplotlib seaborn scikit-learn yfinance
- (Optional) Jupyter Notebook or Spyder for interactive execution.
- Dependencies can be listed in
requirements.txt
if available.
Benefit | Description |
---|---|
Repeatability | Consistent results across multiple datasets |
Efficiency | Saves time on routine tasks |
Scalability | Easy to extend with new modules |
Transparency | Each step is clearly separated and documented |
You can easily extend the project by:
- Adding new cleaning or preprocessing rules.
- Including more EDA visualizations or statistical tests.
- Developing advanced feature engineering functions (e.g., volatility metrics, rolling stats).
- Integrating visualization dashboards using Plotly Dash, Streamlit, or Power BI.
GitHub: @SasySpanish