Math for Machine Learning
Linear regression attempts to model the relationship between:
Independent variables (features) (𝑋) and Dependent variable (target) (y)
The model assumes a linear relationship: Linear Regression Equation The general form of a linear regression equation is:
Where:
- y: Target variable (the value we are trying to predict)
- x1, x2, ..., xn: Input features (independent variables)
- β0, β1, ..., βn: Coefficients (weights) that determine the influence of each input feature on the target variable.
- β0: Intercept (the value of y when all input features are zero)
- ϵ: Error term (represents the difference between the predicted value of y and the actual value)
This equation represents a linear relationship between the target variable and the input features. The goal of linear regression is to find the values of the coefficients (β0, β1, ..., βn) that best fit the data and minimize the error term.
One feature:
Multiple features:
- Collect and clean the dataset
- Separate the dataset in features (X) and target (y)
- Split into training and testing sets
Linear regression uses methods like Ordinary Least Squares (OLS) to minimize the cost function:
Where:
- J(β): The cost function, which measures the average squared difference between the predicted values and the actual values.
- m: The number of samples in the training dataset.
- ŷi: The predicted value for the i-th sample.
- yi: The actual value for the i-th sample.
Minimizing the Cost Function
The goal of linear regression is to find the values of the model's parameters (β) that minimize the cost function. This is typically achieved using optimization algorithms such as gradient descent.
Key Points:
- The OLS cost function penalizes large errors more severely than small errors due to the squaring of the differences.
- Minimizing the MSE leads to the line that best fits the data in a least-squares sense.
Fit the model on the training data to learn th coefficients.
Evaluate using matrics like:
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- R-squared (
$R^{2}$ )
See the Code in the file scikit-learn.py. Test with a dataset called scikit-learn.csv. The dataset contains three columns:
- feature1: Size of the house in square feet
- feature2: Number of bedrooms
- target: Price of the house in $1000
- The model will predict house prices based on size and the number of bedrooms
- You will see metrics Mean Squared Error (MSE) and R-squared (
$R^{2}$ )
MSE measures the average squared difference between the predicted values (
Where:
- n The Number of data points.
- ŷi The predicted value for the i-th sample.
- yi The actual value for the i-th sample.
- Smaller MSE indicates that the predictions are close to the actual values, which means the model is performing better.
- MSE penalizes large errors more than small onces because it squares the differences.
- Units: MSE is in the squre of the target variable's unit. For example, if the target is the house prices in dollars, MSE will be in
$(dollars)^{2}$ - A perfect model would have MSE = 0, meaning to no difference between predications and actual values.
R-squared R² (Coefficient of Determination) explains how much of the variability in the target variable (y) is explained by the features (X) in the model.
Formula
Where:
What It Means:
-
$R^{2}$ ranges from 0 to 1. -
$R^{2}$ = 1 : Perfect model; all variability in y is explained by X -
$R^{2}$ = 0 : The model explains none of the variability in y (no better than predicting the mean$\hat{y}$ ) - Negative
$R^{2}$ : The model performs worse than a simple mean-based prediction
Interpretation
- A higher
$R^{2}$ value indicates a better fit of the model to the data. - For example,
$R^{2}$ = 0.85 implies 85% of the variability in y is explained by the features.
Note:
- R-squared can be misleading in some cases, such as when dealing with small datasets or when the model is overly complex.
- It's important to consider other metrics, such as adjusted R-squared and cross-validation, to evaluate model performance.
For our scikit-learn.py code here, Interpreting MSE and R²
Suppose:
MSE = 2500 (e.g., squared dollars)
R² = 0.92
This means:
- MSE: On average, the squared error in predicted house prices is 2500. The smaller the value, the better the model predicts house prices.
- R²: 92% of the variability in house prices is explained by the model, suggesting it is highly effective.