Welcome to the Predictive Analytics for Cocoa Market Trends project! This repository contains the code and resources for predicting cocoa prices using Long Short-Term Memory (LSTM) neural networks. With the power of deep learning, we aim to forecast cocoa prices and provide valuable insights for stakeholders in the cocoa industry.
This project aims to develop a machine-learning model to forecast cocoa prices using historical data. The primary goal is to provide stakeholders in the cocoa industry with accurate and timely price predictions to aid decision-making and strategic planning. The project involves data collection, model development, validation, testing, and creating a user-friendly interface for stakeholders.
We use a publicly available dataset containing historical cocoa prices. The dataset includes features such as the London futures (£ sterling/tonne), New York futures (US$/tonne), ICCO daily price (US$/tonne), and ICCO daily price (Euro/tonne). We preprocess the data by handling missing values, converting columns to appropriate data types, and calculating moving averages and volatility measures.
The data preprocessing involves:
- Cleaning the data by removing any missing or inconsistent values.
- Encoding categorical variables if necessary.
- Calculating moving averages (7-day, 30-day, 90-day) and volatility measures.
- Splitting the data into training, validation, and test sets.
Feature engineering steps include:
- Creating features from the raw data, such as moving averages and volatility.
- Scaling the features using techniques like MinMaxScaler.
- Ensuring that the data is evenly distributed among training, validation, and test sets.
The LSTM model is built using TensorFlow and Keras. We trained the model on the preprocessed dataset, adjusting hyperparameters such as the number of hidden layers, the number of neurons per layer, and the learning rate. We used techniques like regularization and dropout to prevent overfitting. We implemented Grid Search CV to find the best hyperparameters.
Once the model was trained, we evaluated its performance on the testing dataset. We computed various metrics such as mean squared error (MSE), and mean absolute error (MAE) to assess the model's accuracy. We visualized the predicted cocoa prices alongside the actual prices to gain insights into the model's performance. Optimization techniques, including parameter tuning and cross-validation, were employed to enhance model performance.
https://github.com/nanadotam/AI-Final-Proj/tree/main/models
To run the project locally, follow these steps:
- Clone this repository:
git clone https://github.com/nanadotam/Group35_CocoaPricePrediction.git
- Navigate to the project directory:
cd Group35_CocoaPricePrediction
- Create a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
- Install the necessary dependencies:
pip install -r requirements.txt
- Download the dataset from ICCO and place it in the
data/
directory. - Run the
app.py
script to start the Streamlit application and make predictions on new data:streamlit run streamlit-app.py
- Ensure you have followed the installation steps.
- Run the Streamlit application locally:
streamlit run app.py
- Access the application in your web browser:
http://localhost:8501
In this section, we present the results of our cocoa price prediction experiments. We discuss the model's performance, its strengths, limitations, and potential areas of improvement. We also provide visualizations of the predicted cocoa prices and compare them with the actual prices.
- Mean Squared Error (MSE): 0.000481
- Mean Absolute Error (MAE): 0.0215
-
Decomposition of ICCO Daily Price:
-
Rolling Mean & Standard Deviation:
-
ICCO Daily Price Over Time:
-
LSTM Training and Validation Loss:
- Model Architecture: The LSTM model consists of three layers with 50 units each and a dropout rate of 0.2. The optimizer used is Adam.
- Hyperparameters: The best hyperparameters were found using Grid Search CV with the following values:
- Batch size: 32
- Dropout rate: 0.2
- Optimizer: Adam
- Units: 50
- Training Performance: The model was trained for 75 epochs with early stopping to prevent overfitting. The training loss consistently decreased, indicating the model's ability to learn from the data. The validation loss, however, fluctuated slightly, suggesting some challenges in generalizing to unseen data.
- Example Predictions: The model's predictions were compared to the actual prices, showing a reasonable alignment with the true values.
- Strengths: The LSTM model effectively captures long-term dependencies in the data, making it suitable for time series forecasting. The model achieved a low MSE and MAE, indicating good predictive performance.
- Limitations: The model's performance can vary with different hyperparameters and requires careful tuning. The validation loss fluctuations suggest potential overfitting or sensitivity to data variations.
- Areas for Improvement: Future work could explore alternative model architectures, such as GRU or hybrid models, and incorporate additional features like economic indicators or weather data to enhance predictive accuracy.
Here are some screenshots of the Streamlit app in action:
-
Main Interface:
-
Price Prediction:
-
Historical Data View:
A demonstration video showing how the application works is available on YouTube. You can watch it here.