This project is designed to integrate stock market data analysis, sentiment analysis of tweets, and forecasting of stock prices using traditional statistical models like ARIMA and advanced machine learning models such as GRU (Gated Recurrent Units). The system sources data from MySQL (for historical stock prices), MongoDB (for storing tweets related to stock tickers), and Twitter (for real-time sentiment analysis). The overall goal is to predict future stock prices while analyzing public sentiment towards different stocks based on Twitter data
To run the project, you'll need the following installed:
- Python 3.x
- MySQL (for stock price data)
- MongoDB (for tweet data storage)
- Kafka (for real-time streaming)
- Jupyter Notebook (for data analysis)
- Required Python Libraries (listed in
requirements.txt
)
-
Data Integration:
- Stock market data fetched from MySQL for historical prices.
- Tweet data fetched and stored in MongoDB for analysis.
-
Exploratory Data Analysis (EDA)
- Missing Dates Identification: We identified weekends and market closures where there were no stock data entries, which was achieved using date-related functions in pandas. This helped us understand the periods of inactivity in the stock market.
-
Volatility Analysis: A key analysis performed during the COVID-19 pandemic was the volatility of stock prices. A 5-day rolling window was used to calculate volatility and visualize stock price fluctuations during the pandemic, especially during March 2020.
-
Insights: Significant price volatility was observed during the March 2020 period, driven by the pandemic's market impact.





-
Sentiment Analysis:
- VADER and HuggingFace are used for sentiment analysis of tweets.
-
Stock Price Forecasting:
- ARIMA and GRU models are applied for forecasting future stock prices based on historical data.
-
Performance Benchmarking:
- YCSB is used to benchmark and compare the performance of MySQL and MongoDB in terms of throughput and latency.
-
Real-Time Streaming with Kafka:
- Stream tweet data to analyze real-time sentiment and correlate it with stock price trends.
-
Interactive Dashboard:
- Dash is used for creating an interactive dashboard where users can view stock price forecasts and sentiment trends.



Follow these steps to get the project up and running:
-
Clone the Repository
git clone https://github.com/tashi-2004/Stock-Price-Forecasting-using-Financial-and-Twitter-Data.git cd Stock-Price-Forecasting-using-Financial-and-Twitter-Data
Install the required Python libraries using pip:
pip install -r requirements.txt
- Create a database
stockdb
in MySQL. - Import the historical stock prices into the
stockdb
database.

- Import tweet data into MongoDB's
stockdb
database.

- Configure Kafka for real-time streaming.
- Ensure that your Kafka server, producer, and consumer are running properly.


The ARIMA and GRU models are used for stock price forecasting. The output of these models is used for predictions of 1-day, 3-day, and 7-day stock prices.
An interactive Dash dashboard allows users to visualize stock price trends, sentiment trends, and model performance.
- Root Mean Squared Error (RMSE)
- Mean Absolute Error (MAE)
- ARIMA performs better for stable stock price predictions.
- GRU performs better for volatile stock price predictions.
- Use the provided
ycsb_benchmark.sh
file to run benchmarking on MySQL and MongoDB.
./ycsb.sh



Kafka will stream tweet data in real-time to analyze sentiment and correlate it with stock price movements.
We welcome contributions to this repository. Please feel free to fork the repository and submit issues or pull requests. Any improvements or optimizations are greatly appreciated.
For queries or contributions, please contact:
Tashfeen Abbasi
Email: abbasitashfeen7@gmail.com