Historically, gold had been used as a form of currency in various parts of the world including the USA. In present times, precious metals like gold are held with central banks of all countries to guarantee re-payment of foreign debts, and also to control inflation which results in reflecting the financial strength of the country. Recently, emerging world economies, such as China, Russia, and India have been big buyers of gold, whereas the USA, SoUSA, South Africa, and Australia are among the big seller of gold.
Forecasting rise and fall in the daily gold rates can help investors to decide when to buy (or sell) the commodity. But Gold prices are dependent on many factors such as prices of other precious metals, prices of crude oil, stock exchange performance, Bonds prices, currency exchange rates, etc.
The challenge of this project is to accurately predict the future adjusted closing price of Gold ETF across a given period of time in the future. The problem is a regression problem, because the output value which is the adjusted closing price in this project is continuous value.
The historical data of Gold ETF fetched from Yahoo finance has 7 columns, Date, Open, High, Low, Close, Adjusted Close and Volume, the difference between Adjusted Close and Close is that closing price of a stock is the price of that stock at the close of the trading day. Whereas the adjusted closing price takes into account factors such as dividends, stock splits and new stock offerings to determine a value. We would use Adjusted Close as our outcome variables which is the value we want to predict.
3. 'EG_open', 'EG_high', 'EG_low', 'EG_close', 'EG_Ajclose', 'EG_volume' of Eldorado Gold Corporation (EGO),
12. 'USDI_Price', 'USDI_Open', 'USDI_High','USDI_Low', 'USDI_Volume', 'USDI_Trend' of US dollar Index Price,
13. 'GDX_Open', 'GDX_High', 'GDX_Low', 'GDX_Close', 'GDX_Adj Close', 'GDX_Volume' of Gold Miners ETF,
Since the price of gold is influenced by various factors, this project utilizes the concept of correlation to better understand the data and gain insights into the relationships between different variables. By analyzing the correlation between rows and columns, we can identify their connections and make optimal use of this information in the model.
I was also able to use charts to analyze the relationships between the data and map out their upward or downward trends based on the daily and yearly fluctuations in gold prices.
In the model design and construction phase, I first applied normalization and data preprocessing to organize the dataset effectively. Then, I built a model based on the Random Forest algorithm, with its implementation outlined as follows: Random Forest is a powerful ensemble learning method that enhances predictive accuracy and robustness by combining multiple decision trees. By aggregating the outputs of numerous weak learners, it mitigates overfitting and improves generalization to unseen data. Additionally, its inherent feature importance evaluation provides valuable insights into the significance of different variables, making it a reliable choice for both regression and classification tasks, particularly in complex and high-dimensional datasets.
Based on the provided explanations, this approach has demonstrated the best performance for such a dataset. It has successfully uncovered valuable and meaningful insights, achieving significant R² and Mean Squared Error (MSE) scores.
One of the key challenges observed in this project was overfitting, where the model exhibited excessively high accuracy on the training data. To address this issue, we applied regularization techniques and adjusted the hyperparameters to reduce overfitting. These modifications helped the model achieve a more balanced performance, enabling it to predict and process data at a reasonable level of accuracy.
you can see more details in Source code. Good Luck!