ft_linear_regression

The project

This school project is an introduction to machine learning. The goal is to predict the price of a car based on its mileage. The project is divided into two parts: a training part and a prediction part.

What is linear regression?

Linear regression is the one of the founding principles of machine learning.

Machine learning is a field of artificial intelligence that uses statistical techniques to give computer systems the ability to "learn" (i.e., progressively improve performance on a specific task) from data, without being explicitly programmed.

How does this project use linear regression?

This project aims at producing a model that can predict the price of a car based on its mileage.

The project is made of two parts: a training part and a prediction part.

The training part

The training part is based on the following hypothesis:

$$\begin{equation*} error = \frac{1}{2m} \sum_{i=1}^{m} (estimatePrice(mileage_i) - price_i)^2 \end{equation*}$$

Where $m$ is the number of samples in the dataset, $mileage_i$ is the mileage of the $i-th$ car in the dataset, and $price_i$ is the price of the $i-th$ car in the dataset.

This hypothesis is a cost function that measures the error of the model. The goal of the training part is to find the values of the parameters $\theta_0$ and $\theta_1$ that minimize the error of the model.

The values of the parameters $\theta_0$ and $\theta_1$ are found using the gradient descent algorithm. This algorithm is an optimization algorithm that is used to minimize the error of the model. They will be stored in a file theta.json that will be used in the prediction part.

The prediction part

The prediction part is based on the following hypothesis:

$$\begin{equation*} estimatePrice(mileage) = \theta_0 + (\theta_1 \times mileage) \end{equation*}$$

This hypothesis is based on the assumption that the price of a car is a linear function of its mileage. The goal of the prediction program is to predict the price of the car based on its mileage.

The algorithm

This algorithm is detailed in code inside the jupyter notebook inside this repo.

Usage

Dataset

The dataset used to train the model is located in the data.csv file. This file contains two columns: mileage and price.

Training

To train the model, you need to run the following command:

python train.py

This will generate a file called theta.csv that contains the values of the parameters of the model.

A word about normalization

I chose to implement Robust Scaling as a normalization method. This method is robust to outliers¹ and is based on the following formula:

$$\begin{equation*} X_{scaled} = \frac{X - X_{median}}{IQR} \end{equation*}$$

Where $X$ is the feature to scale, $X_{median}$ is the median of the feature, and $IQR$ is the interquartile range of the feature.

Prediction

To predict the price of a car based on its mileage, you need to run the following command:

python predict.py

You will be prompted to enter the mileage of the car you want to predict the price.

Installation

To install the dependencies, you need to run the following command:

pip install -r requirements.txt

References

Articles

Videos

outliers are values that are significantly different from the rest of the data and can distort the model. Robust scaling is a normalization method that is robust to outliers. My main goal in implementing this normalization method was to prevent this issue from happening. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
ft_linear_regression		ft_linear_regression
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data.csv		data.csv
data_difflen.csv		data_difflen.csv
install.sh		install.sh
notebook.ipynb		notebook.ipynb
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ft_linear_regression

The project

What is linear regression?

How does this project use linear regression?

The training part

The prediction part

The algorithm

Usage

Dataset

Training

A word about normalization

Prediction

Installation

References

Articles

Videos

About

Uh oh!

Releases

Packages

Languages

License

semx2a/ft_linear_regression

Folders and files

Latest commit

History

Repository files navigation

ft_linear_regression

The project

What is linear regression?

How does this project use linear regression?

The training part

The prediction part

The algorithm

Usage

Dataset

Training

A word about normalization

Prediction

Installation

References

Articles

Videos

Footnotes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages