You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/getting_started/linear_regression.md
+30-15Lines changed: 30 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -169,12 +169,29 @@ The losses are identical! This means that our `model` and the `flux_model` are i
169
169
170
170
### Training the model
171
171
172
-
Before we begin the training procedure with `Flux`, let's initialize an optimiser and finalize our data. We will be using the classic [`Gradient Descent`](@ref Descent) algorithm. `Flux` comes loaded with a lot of different optimisers; refer to [Optimisers](@ref) for more information on the same.
172
+
Let's train our model using the classic Gradient Descent algorithm. According to the gradient descent algorithm, the weights and biases are iteratively updated using the following mathematical equations -
173
+
174
+
```math
175
+
\begin{aligned}
176
+
W &= W - \eta * \frac{dL}{dW} \\
177
+
b &= b - \eta * \frac{dL}{db}
178
+
\end{aligned}
179
+
```
180
+
181
+
Here, `W` is the weight matrix, `b` is the bias vector, ``\eta`` is the learning rate, ``\frac{dL}{dW}`` is the derivative of the loss function with respect to the weight, and ``\frac{dL}{db}`` is the derivative of the loss function with respect to the bias.
182
+
183
+
The derivatives are usually calculated using an Automatic Differentiation tool, and `Flux` uses `Zygote.jl` for the same. Since `Zygote.jl` is an independent Julia package, it can be used outside of Flux as well! Refer to the documentation of `Zygote.jl` for more information on the same.
184
+
185
+
Our first step would be to obtain the gradient of the loss function with respect to the weights and the biases. `Flux` re-exports `Zygote`'s `gradient` function; hence, we don't need to import `Zygote` explicitly to use the functionality.
@@ -184,18 +201,16 @@ julia> b .= b .- 0.1 .* dLdb
184
201
0.41325632
185
202
```
186
203
187
-
Now, we can move to the actual training! The training consists of obtaining the gradient and updating the current parameters with the obtained derivatives using backpropagation. This is achieved using `Flux.gradient` (see see [Taking Gradients](@ref)) and [`Flux.Optimise.update!`](@ref) functions respectively.
188
-
189
-
We can now check the values of our parameters and the value of the loss function -
204
+
The parameters have been updated! We can now check the value of the loss function -
The parameters changed, and the loss went down! This means that we successfully trained our model for one epoch. We can plug the training code written above into a loop and train the model for a higher number of epochs. It can be customized either to have a fixed number of epochs or to stop when certain conditions are met, for example, `change in loss < 0.1`. This loop can be customized to suit a user's needs, and the conditions can be specified in plain `Julia`!
211
+
The loss went down! This means that we successfully trained our model for one epoch. We can plug the training code written above into a loop and train the model for a higher number of epochs. It can be customized either to have a fixed number of epochs or to stop when certain conditions are met, for example, `change in loss < 0.1`. The loop can be tailored to suit the user's needs, and the conditions can be specified in plain `Julia`!
197
212
198
-
`Flux` also provides a convenience function to train a model. The [`Flux.train!`](@ref)function performs the same task described above and does not require calculating the gradient manually.
213
+
Let's plug our super training logic inside a function and test it again -
The parameters changed again, and the loss went down again! This was the second epoch of our training procedure. Let's plug this in a for loop and train the model for 60 epochs.
228
+
It works, and the loss went down again! This was the second epoch of our training procedure. Let's plug this in a for loop and train the model for 30 epochs.
There was a significant reduction in loss, and the parameters were updated!
225
240
226
241
`Flux` provides yet another convenience functionality, the [`Flux.@epochs`](@ref) macro, which can be used to train a model for a specific number of epochs.
We can train the model even more or tweak the hyperparameters to achieve the desired result faster, but let's stop here. We trained our model for 72 epochs, and loss went down from `22.74856` to `7.6680417f`. Time for some visualization!
260
+
We can train the model even more or tweak the hyperparameters to achieve the desired result faster, but let's stop here. We trained our model for 42 epochs, and loss went down from `22.74856` to `7.6680417f`. Time for some visualization!
246
261
247
262
### Results
248
263
The main objective of this tutorial was to fit a line to our dataset using the linear regression algorithm. The training procedure went well, and the loss went down significantly! Let's see what the fitted line looks like. Remember, `Wx + b` is nothing more than a line's equation, with `slope = W[1]` and `y-intercept = b[1]` (indexing at `1` as `W` and `b` are iterable).
The line fits well! There is room for improvement, but we leave that up to you! You can play with the optimisers, the number of epochs, learning rate, etc. to improve the fitting and reduce the loss!
261
276
262
277
## Linear regression model on a real dataset
263
-
We now move on to a relative;y complex linear regression model. Here we will use a real dataset from [`MLDatasets.jl`](https://github.com/JuliaML/MLDatasets.jl), which will not confine our data points to have only one feature. Let's start by importing the required packages -
278
+
We now move on to a relatively complex linear regression model. Here we will use a real dataset from [`MLDatasets.jl`](https://github.com/JuliaML/MLDatasets.jl), which will not confine our data points to have only one feature. Let's start by importing the required packages -
Before training the model, let's initialize the optimiser and let `Flux` know that we want all the derivatives of all the parameters of our `model`.
354
+
The training procedure would make use of the same mathematics, but now we can pass in the model inside the `gradient` call and let `Flux` and `Zygote` handle the derivatives!
340
355
341
356
```jldoctest linear_regression_complex
342
357
julia> function train_model()
@@ -348,7 +363,7 @@ julia> function train_model()
348
363
349
364
Contrary to our last training procedure, let's say that this time we don't want to hardcode the number of epochs. We want the training procedure to stop when the loss converges, that is, when `change in loss < δ`. The quantity `δ` can be altered according to a user's need, but let's fix it to `10⁻³` for this tutorial.
350
365
351
-
We can write such custom training loops effortlessly using Flux and plain Julia!
366
+
We can write such custom training loops effortlessly using `Flux` and plain `Julia`!
352
367
```jldoctest linear_regression_complex
353
368
julia> loss_init = Inf;
354
369
@@ -393,11 +408,11 @@ The loss is not as small as the loss of the training data, but it looks good! Th
393
408
394
409
---
395
410
396
-
Summarising this tutorial, we started by generating a random yet correlated dataset for our custom model. We then saw how a simple linear regression model could be built with and without Flux, and how they were almost identical.
411
+
Summarising this tutorial, we started by generating a random yet correlated dataset for our custom model. We then saw how a simple linear regression model could be built with and without `Flux`, and how they were almost identical.
397
412
398
-
Next, we trained the model by manually calling the gradient function and optimising the loss. We also saw how Flux provided various wrapper functionalities like the train! function to make the API simpler for users.
413
+
Next, we trained the model by manually writing down the Gradient Descent algorithm and optimising the loss. We also saw how `Flux` provides various wrapper functionalities and keeps the API extremely intuitive and simple for the users.
399
414
400
-
After getting familiar with the basics of Flux and Julia, we moved ahead to build a machine learning model for a real dataset. We repeated the exact same steps, but this time with a lot more features and data points, and by harnessing Flux's full capabilities. In the end, we developed a training loop that was smarter than the hardcoded one and ran the model on our normalised dataset to conclude the tutorial.
415
+
After getting familiar with the basics of `Flux` and `Julia`, we moved ahead to build a machine learning model for a real dataset. We repeated the exact same steps, but this time with a lot more features and data points, and by harnessing `Flux`'s full capabilities. In the end, we developed a training loop that was smarter than the hardcoded one and ran the model on our normalised dataset to conclude the tutorial.
0 commit comments