An algorithmic trading model for GLD ETF using hybrid ARIMA-LSTM architecture with multi-source data integration (macro data, stock prices, and news data with FinBERT sentiment analysis)
The project implements an ensemble approach combining ARIMA and LSTM models:
- ARIMA Model: Captures linear trends and seasonality using technical indicators and macroeconomic features
- LSTM Model: Models non-linear patterns in ARIMA residuals for improved prediction accuracy
- Ensemble Prediction: Final predictions combine ARIMA forecasts with LSTM-predicted residual corrections
- GLD ETF: Daily OHLCV data with technical indicators
- Technical Indicators: EMA (30, 60, 200), RSI (14), MACD, Bollinger Bands, Momentum (10)
- DTWEXBGS: US Dollar Trade Weighted Exchange Rate
- CPIAUCSL: Consumer Price Index (inflation)
- FEDFUNDS: Federal Funds Rate
- GDP: Gross Domestic Product
- OIL_PRICE: Crude Oil Prices
- FinBERT: Financial news sentiment analysis using pre-trained FinBERT model
- News Sources: Daily financial news related to gold and market conditions
GLD-Trader/
├── data/
│ ├── ingestion/ # Data collection scripts
│ ├── combined_data.csv # Final processed dataset
│ ├── finbert_gold_news.csv # Sentiment-analyzed news
│ └── macro_data.csv # Macroeconomic indicators
├── models/
│ └── training.ipynb # Model development and training
├── images/ # Performance visualizations
└── logs/ # Data ingestion logs
The model uses 20+ features including:
- Technical indicators (EMAs, RSI, MACD, Bollinger Bands)
- Macroeconomic indicators (interest rates, inflation, USD strength, oil prices)
- Market volume and momentum
- News sentiment scores (FinBERT-based)
- Order: (1, 1, 0) with exogenous variables
- Uses all technical and macroeconomic features
- Captures linear market trends and fundamental relationships
- 16 hidden units with 0.5 dropout
- 14-day sliding window for residual prediction
- SGD optimizer with momentum and learning rate scheduling
- Trained for 1000 epochs with early stopping
- 60% ARIMA training, 24% LSTM training, 16% final testing
- StandardScaler normalization for LSTM features
- Residual-based ensemble approach for improved accuracy
Currently adding trading logic, integrating with Alpaca, and sourcing better news data.
- Run data ingestion scripts in
data/ingestion/
- Execute model training in
models/training.ipynb
- View performance metrics and visualizations
- Apply trained model for GLD price predictions
See requirements.txt
for complete package list including:
- pandas, numpy, scikit-learn
- statsmodels (ARIMA)
- torch (LSTM)
- transformers (FinBERT)
- matplotlib, seaborn (visualization)