Option-Value-Machine-Learning-Project

This project examines European call option pricing data on the S&P 500. The core idea of the project is to use the training data to build statistical/ML models with

Value as the response (i.e.,a regression problem) and then
BS (Overestimated/Underestimated) as the response (i.e.,a classification problem).

Our analysis first focused on predicting the 'Value' of options using a 11 different machine learning techniques, they are Linear Regression, Linear Regression after Best Subset Selection, KNN Regression, Decision Tree Regression, Random Forest Regression, Lasso Regression, Ridge Regression, SVM for Regression, Neural Networks, Radial Basis Function Networks, and Gradient Boosting Regressor. We assess them based on Mean Square Error (MSE) and out-of-sample R-Squared. To enhance model accuracy, we standardized features to mitigate bias from scale differences, particularly crucial for models sensitive to feature distance. The standout models in terms of low MSE and high R-squared were Neural Networks, Gradient Boosting Regression, and Random Forest Regression. These models excel in capturing complex, non-linear relationships within the data, a critical advantage in financial markets influenced by dynamic, non-linear factors like market volatility and sentiment. The Neural Network, optimized with Adam optimizer and trained using five-fold cross-validation, proved particularly effective, highlighting its ability to learn and adapt from data intricacies.

In classification, we also tried 11 different models. They are Logistic Regression, Logistic Regression after Best Subset Selection,Quadratic Discriminant Analysis (QDA), K-Nearest Neighbour (KNN), Decision Trees, Random Forest, SVM, Neural Networks, Boosting, Bagging, Navie Bayes. Among these methods, Random Forest, Bagging, and Decision Tree Regression emerged as top performers due to their minimal classification errors. Parameter tuning identified optimal settings for Random Forest at 200 trees and 1 feature to consider in each split, balancing accuracy with robustness against overfitting. Random Forest's approach of using a random subset of features to decide per tree split decorrelates trees and reduces overfitting risks, which is crucial for reliable predictions in complex datasets. This model's effectiveness was further demonstrated through a detailed structure analysis, showing how each decision node and branch contributes to its high precision and efficiency. The optimization of Random Forest through parameter tuning by rigorous grid searches and cross-validation solidified its status as the preferred model for handling complex, high-dimensional datasets in fluctuating market conditions.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
checkpoints		checkpoints
Code.ipynb		Code.ipynb
README.md		README.md
Report.pdf		Report.pdf
Slides.pdf		Slides.pdf
option_test_nolabel.csv		option_test_nolabel.csv
option_train.csv		option_train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Option-Value-Machine-Learning-Project

About

Uh oh!

Releases

Packages

Languages

Manhui-z/Option-Value-Machine-Learning-Project

Folders and files

Latest commit

History

Repository files navigation

Option-Value-Machine-Learning-Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages