Skip to content

Manhui-z/Option-Value-Machine-Learning-Project

Repository files navigation

Option-Value-Machine-Learning-Project

This project examines European call option pricing data on the S&P 500. The core idea of the project is to use the training data to build statistical/ML models with

  1. Value as the response (i.e.,a regression problem) and then
  2. BS (Overestimated/Underestimated) as the response (i.e.,a classification problem).

Our analysis first focused on predicting the 'Value' of options using a 11 different machine learning techniques, they are Linear Regression, Linear Regression after Best Subset Selection, KNN Regression, Decision Tree Regression, Random Forest Regression, Lasso Regression, Ridge Regression, SVM for Regression, Neural Networks, Radial Basis Function Networks, and Gradient Boosting Regressor. We assess them based on Mean Square Error (MSE) and out-of-sample R-Squared. To enhance model accuracy, we standardized features to mitigate bias from scale differences, particularly crucial for models sensitive to feature distance. The standout models in terms of low MSE and high R-squared were Neural Networks, Gradient Boosting Regression, and Random Forest Regression. These models excel in capturing complex, non-linear relationships within the data, a critical advantage in financial markets influenced by dynamic, non-linear factors like market volatility and sentiment. The Neural Network, optimized with Adam optimizer and trained using five-fold cross-validation, proved particularly effective, highlighting its ability to learn and adapt from data intricacies.

In classification, we also tried 11 different models. They are Logistic Regression, Logistic Regression after Best Subset Selection,Quadratic Discriminant Analysis (QDA), K-Nearest Neighbour (KNN), Decision Trees, Random Forest, SVM, Neural Networks, Boosting, Bagging, Navie Bayes. Among these methods, Random Forest, Bagging, and Decision Tree Regression emerged as top performers due to their minimal classification errors. Parameter tuning identified optimal settings for Random Forest at 200 trees and 1 feature to consider in each split, balancing accuracy with robustness against overfitting. Random Forest's approach of using a random subset of features to decide per tree split decorrelates trees and reduces overfitting risks, which is crucial for reliable predictions in complex datasets. This model's effectiveness was further demonstrated through a detailed structure analysis, showing how each decision node and branch contributes to its high precision and efficiency. The optimization of Random Forest through parameter tuning by rigorous grid searches and cross-validation solidified its status as the preferred model for handling complex, high-dimensional datasets in fluctuating market conditions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published