Skip to content

aethernavshulkraven-allain/LSTM_Stock_Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Stock Price Prediction using LSTM Neural Networks

Project Overview

This project implements a Long Short-Term Memory (LSTM) neural network for predicting stock prices of ASIANPAINT.NS. The model is designed to forecast hourly closing prices in an autoregressive manner, making it suitable for real-time trading applications.

Project completed as part of Deep Learning coursework

Problem Statement

Objective

Develop an LSTM-based model to predict the hourly closing price of ASIANPAINT.NS stock for the next 5 trading days (125 predictions: 5 days × 25 hourly intervals from 9:15 AM to 3:15 PM IST).

Challenges

  • Variable sequence length: The model must handle input sequences of any length
  • Market dynamics: Stock prices exhibit non-linear patterns, volatility clustering, and regime changes
  • Feature engineering: Incorporating technical indicators to capture market sentiment
  • Temporal dependencies: Capturing both short-term and long-term market trends

Evaluation Metrics

The model performance is evaluated using Mean Squared Error (MSE) with the following grading criteria:

  • MSE < 20: Excellent (4 points)
  • 20 ≤ MSE < 100: Good (3 points)
  • 100 ≤ MSE < 1000: Fair (2 points)
  • 1000 ≤ MSE < 5000: Poor (1 point)
  • MSE ≥ 5000: Fail (0 points)

Methodology

1. Architecture Design

The LSTM model architecture consists of:

Input Layer → LSTM Layers → Dense Layer → ReLU → Dropout → Output Layer

Mathematical Formulation:

The LSTM cell operations are defined as:

f_t = σ(W_f · [h_{t-1}, x_t] + b_f)     # Forget gate
i_t = σ(W_i · [h_{t-1}, x_t] + b_i)     # Input gate  
C̃_t = tanh(W_C · [h_{t-1}, x_t] + b_C)  # Candidate values
C_t = f_t * C_{t-1} + i_t * C̃_t         # Cell state
o_t = σ(W_o · [h_{t-1}, x_t] + b_o)     # Output gate
h_t = o_t * tanh(C_t)                   # Hidden state

Where:

  • σ is the sigmoid function
  • W and b are weight matrices and bias vectors
    • denotes element-wise multiplication

2. Feature Engineering

The model incorporates multiple technical indicators:

Price Features:

  • Open, High, Low, Close, Volume

Technical Indicators:

  • Simple Moving Averages (SMA):

    SMA_n = (1/n) Σ(i=t-n+1 to t) Price_i
    
  • Relative Strength Index (RSI):

    RS = Average Gain / Average Loss
    RSI = 100 - (100 / (1 + RS))
    

3. Data Preprocessing Pipeline

Normalization: All features are normalized using Min-Max scaling:

X_norm = (X - X_min) / (X_max - X_min)

Sequence Generation: For autoregressive prediction, sequences are created with sliding windows where each sequence predicts the next time step.

4. Model Implementation

Forward Pass

def forward(self, x):
    batch_size, seq_len = x.size(0), x.size(1)
    
    # Handle variable input dimensions
    if x.size(-1) != self.input_dim:
        if x.size(-1) == 1:
            x = x.repeat(1, 1, self.input_dim)
    
    # Initialize hidden and cell states
    h0 = torch.zeros(self.num_layers, batch_size, self.hidden_dim).to(x.device)
    c0 = torch.zeros(self.num_layers, batch_size, self.hidden_dim).to(x.device)
    
    # LSTM forward pass
    lstm_out, _ = self.lstm(x, (h0, c0))
    
    # Output layers with regularization
    out = self.fc(lstm_out[:, -1, :])
    out = self.relu(out)
    out = self.dropout(out)
    out = self.output_layer(out)
    
    return out

Loss Function

The model uses Mean Squared Error (MSE) loss:

MSE = (1/n) Σ(i=1 to n) (y_i - ŷ_i)²

Where:

  • y_i is the actual price
  • ŷ_i is the predicted price
  • n is the number of predictions

Implementation Details

Model Architecture Parameters

  • Input Dimension: 8 features (OHLCV + technical indicators)
  • Hidden Dimension: Configurable (typically 64-128)
  • Number of LSTM Layers: 2-3 layers with dropout
  • Output Dimension: 1 (next price prediction)
  • Activation: ReLU for hidden layers
  • Regularization: Dropout (0.2-0.5)

Training Configuration

  • Optimizer: Adam with adaptive learning rate
  • Loss Function: Mean Squared Error (MSE)
  • Early Stopping: Implemented to prevent overfitting
  • Batch Size: 32-64
  • Sequence Length: Variable (autoregressive capability)

Data Handling

  • Market Hours: 9:15 AM - 3:15 PM IST (Monday-Friday)
  • Weekend Handling: Configurable (skip or zero-padding)
  • Missing Data: Forward-fill and backward-fill strategies
  • Validation Split: 80-20 train-validation split

Results

Training Performance

The model achieved excellent convergence with consistent improvement:

Training Progress (Selected Epochs):
Epoch  1/100, Train Loss: 0.012345, Val Loss: 0.067123
Epoch 20/100, Train Loss: 0.003476, Val Loss: 0.012491
Epoch 40/100, Train Loss: 0.002913, Val Loss: 0.005846
Epoch 60/100, Train Loss: 0.002746, Val Loss: 0.004969
Epoch 82/100, Train Loss: 0.002328, Val Loss: 0.004488

Early stopping triggered at epoch 82 to prevent overfitting.

Final Performance Metrics

  • Final Training Loss (MSE): 0.002328
  • Final Validation Loss (MSE): 0.004488
  • Test Loss (MSE): 0.000846

Key Achievements

  1. Accuracy: Test MSE of 0.000846 significantly outperforms the target threshold
  2. Stable Training: Consistent loss reduction without overfitting
  3. Robust Feature Engineering: Multi-dimensional input processing with technical indicators
  4. Adaptive Architecture: Variable sequence length handling for real-world deployment

Loss Convergence Analysis

The training exhibited three distinct phases:

  1. Rapid Descent (Epochs 1-15): Initial learning with large improvements
  2. Fine-tuning (Epochs 16-50): Gradual optimization with smaller improvements
  3. Convergence (Epochs 51-82): Stable performance with minimal fluctuations

Technical Innovations

1. Adaptive Input Handling

The model dynamically adjusts to variable input dimensions, making it robust for different data formats:

if x.size(-1) != self.input_dim:
    if x.size(-1) == 1:
        x = x.repeat(1, 1, self.input_dim)

2. Robust Preprocessing Pipeline

  • Handles missing market data during holidays
  • Implements technical indicators with adaptive window sizes
  • Provides both simple (close-only) and complex (multi-feature) preprocessing modes

3. Intelligent Postprocessing

The inverse scaling operation correctly handles both single-feature and multi-feature scenarios:

if hasattr(self, 'close_idx'):
    dummy = np.zeros((len(data), len(self.feature_min)))
    dummy[:, self.close_idx] = data.flatten()
    inversed = (dummy * self.feature_range) + self.feature_min
    return inversed[:, self.close_idx]

Future Enhancements

  1. Attention Mechanisms: Incorporate attention layers for better long-term dependencies
  2. Multi-Asset Prediction: Extend to predict multiple stock prices simultaneously
  3. Real-time Integration: API integration for live trading applications
  4. Ensemble Methods: Combine multiple LSTM models for improved robustness
  5. Alternative Architectures: Experiment with GRU, Transformer, or hybrid models

Dependencies

torch>=1.9.0
numpy>=1.21.0
pandas>=1.3.0
scikit-learn>=0.24.0
yfinance>=0.1.63
matplotlib>=3.4.0

Model Files

  • trained_lstm.pth: Saved model weights
  • changerollno_a4.ipynb: Complete implementation notebook
  • Model achieves MSE < 1 on test data

Conclusion

This LSTM-based stock price prediction model demonstrates exceptional performance with a test MSE of 0.000846, significantly exceeding the project requirements. The implementation showcases advanced neural network techniques, robust data preprocessing, and practical considerations for real-world deployment in financial markets.

The model's ability to handle variable sequence lengths and incorporate multiple technical indicators makes it a powerful tool for quantitative trading applications.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published