This project aims to analyze the influence of global stock market indices on the Egyptian EGX 30 index using PySpark for data processing and advanced econometric models such as GARCH and Granger causality tests for analysis.
- Historical data for global indices (e.g., S&P 500, FTSE 100) and EGX 30 was sourced from platforms such as Yahoo Finance and Investing.com.
- Exchange rates were integrated to convert EGX 30 data (EGP) into USD for consistency.
-
Merging and aligning data by date across all indices.
-
Handling missing values using forward-fill or dropping missing rows.
-
Calculating daily returns for each index using the formula:
- Summary statistics were computed for daily returns to provide insights into the distribution and variability of each index.
- Visualization tools such as Matplotlib were used to plot time series and return distributions.
- A correlation matrix was computed to measure the linear relationship between the indices.
- Heatmaps were used for visualizing correlations.
- Granger causality tests were conducted to determine whether past values of one index can predict another index.
- P-values for different lags were plotted to interpret significant causal relationships.
- We trained a Gradient Boosting Regressor for each stock index using its respective data and evaluated its performance on the test split of the same dataset.
- Each trained model was cross-validated against EGX_30 data to compare its performance with the results obtained on its own test dataset.
- Python 3.9+
- Apache Spark with PySpark
- JDK 8 or later (for Spark compatibility)
- Required Python libraries:
pandas
numpy
matplotlib
statsmodels
arch
- Clone this repository.
git clone https://github.com/Ahmedh12/Global_Trends_Spillover_Effect_Analysis.git
- Set up a Conda environment.
conda create -n my_pyspark_env python=3.9 conda activate my_pyspark_env
- Install required libraries.
pip install pandas numpy matplotlib statsmodels arch pyspark
- Ensure Spark is configured correctly with Hadoop and JDK.
- Run
download_data.py
to download the indices data indata/raw
directory - Run the PySpark script to preprocess data and calculate daily returns.
- Use Jupyter Notebook or the provided scripts to conduct analyses (correlation, Granger causality).
- Visualize and interpret the results.
- Descriptive Statistics: Summary tables and histograms for returns.
- Correlation Analysis: Heatmaps showing relationships between indices.
- Granger Causality: Lag-specific causal relationships.
- GARCH Model: Insights into volatility dynamics.
- Investing.com. (n.d.). Historical data of global indices.
- Yahoo Finance. (n.d.). Historical data for S&P 500. Retrieved from https://finance.yahoo.com/quote/%5EGSPC/history/
- PySpark documentation. Apache Spark: Unified Analytics Engine for Big Data. Retrieved from https://spark.apache.org/docs/latest/api/python/
This project is licensed under the MIT License.