2. Model Development

About The Project

Build 2 machine learning models: a regression model to predict Black-Scholes option price and a classification model to predict whether Black-Scholes model overestimates or underestimate the actual option price.

Tools: Python (NumPy, Pandas, Scikit-learn, ...), MS Excel, PowerPoint

Skills: Exploratory Data Analysis, Applied Statistics, Data Visualization, Machine Learning, Feature Engineering, Project Management, Team Collaboration, Business Communication,

1. Exploratory Data Analysis

The dataset contains 1,680 records and 6 columns as followings:

1.1 Data Quality Summary Table

Note

There are missing values and outliers in some fields of the data by looking at % Populated and Min, Max, and Mean. Therefore, we will considering dropping missing values and outliers before modeling step.

1.2 Data Cleaning

We decided to removed any record with missing value and with outliers that fell beyond 3 standard deviation from the mean of any field. Below is the detailed view of detected records with missing values and/or outliers in any field:

Note

We only dropped 7 over 1,680 records from the original data, which won’t be significant. There are 1,673 records after the data cleaning step.

1.3 Visualization of each Field Before and After Cleaning the Data

Note

The distribution of field Stock Price (S) and field Time to Maturity (t) become clearer after dropping outliers and missing values.

1.4 Feature Engineering

2. Model Development

Our objective for model exploration was to experiment with different models to select the best model for both regression and classification problems.

In the regression problem, we wanted to train a model that can accurately predict the option price.
In the classification problem, we wanted to build a model that can accurately classify whether using the Black-Scholes algorithm would underestimate or overestimate the actual option price.

We tried different combinations of these tuning hyperparameters to find the best performing models:

Below is our method to evaluate and select the best model for each problem:

2.1 Regression Model Results

Note

In general, non-linear models outperformed the baseline Linear Regression model significantly. Gradient Boosting Regression model performs the best with the highest and the least variability in testing and cross validation R-squared score. This means that this model is more consistent and robust.

2.2 Classification Model Results

Note

In general, non-linear models outperformed the baseline Logistic Regression by a little. Logistic Regression shows less sign of overfitting comparing to other models. CatBoost model performs the best with the highest and the least variability in testing and cross validation accuracy score. This means that this model is quite more accurate and robust than other models.

2.3 Baseline Models

2.4 Final Model Selection

3. Business Understandings

Some business understandings need to be considered when predicting option values:

Accurately predicting European call option values is essential to achieve the most optimal financial outcomes, but the interpretation is also important for decision-making. Understanding the relationships between predictor variables and response variables can provide valuable insights to guide investment strategies, risk management, or policy decisions.
Machine learning models can outperform the Black-Scholes model in predicting option prices due to their flexibility and adaptability. While the Black-Scholes model, primarily used in European option trading, relies on a fixed set of assumptions and features, machine learning models can capture complex patterns, nonlinear relationships, varying volatility, changing interest rates, and non-continuous trading scenarios. This enables machine learning models to achieve higher accuracy and greater practicality in real-world trading environments.
Applied to predict Tesla’s option price? Tesla is very unique compared to other S&P 500 stock options. Due to its high volatility, the CEO’s sentiments, emerging industry dynamics, and growth expectations, predicting Tesla’s call option price using these existing patterns would be very challenging or yield poor performance.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
codes.ipynb		codes.ipynb
option_test_wolabel.csv		option_test_wolabel.csv
option_train.csv		option_train.csv
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About The Project

1. Exploratory Data Analysis

1.1 Data Quality Summary Table

1.2 Data Cleaning

1.3 Visualization of each Field Before and After Cleaning the Data

1.4 Feature Engineering

2. Model Development

2.1 Regression Model Results

2.2 Classification Model Results

2.3 Baseline Models

2.4 Final Model Selection

3. Business Understandings

About

Uh oh!

Releases

Packages

Languages

tomvdo29usc/Call_Option_Pricing_Prediction

Folders and files

Latest commit

History

Repository files navigation

About The Project

1. Exploratory Data Analysis

1.1 Data Quality Summary Table

1.2 Data Cleaning

1.3 Visualization of each Field Before and After Cleaning the Data

1.4 Feature Engineering

2. Model Development

2.1 Regression Model Results

2.2 Classification Model Results

2.3 Baseline Models

2.4 Final Model Selection

3. Business Understandings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages