Skip to content

Commit 2f245c7

Browse files
committed
Update the text to manually run gradient descent
1 parent c668705 commit 2f245c7

File tree

1 file changed

+30
-15
lines changed

1 file changed

+30
-15
lines changed

docs/src/getting_started/linear_regression.md

Lines changed: 30 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -169,12 +169,29 @@ The losses are identical! This means that our `model` and the `flux_model` are i
169169

170170
### Training the model
171171

172-
Before we begin the training procedure with `Flux`, let's initialize an optimiser and finalize our data. We will be using the classic [`Gradient Descent`](@ref Descent) algorithm. `Flux` comes loaded with a lot of different optimisers; refer to [Optimisers](@ref) for more information on the same.
172+
Let's train our model using the classic Gradient Descent algorithm. According to the gradient descent algorithm, the weights and biases are iteratively updated using the following mathematical equations -
173+
174+
```math
175+
\begin{aligned}
176+
W &= W - \eta * \frac{dL}{dW} \\
177+
b &= b - \eta * \frac{dL}{db}
178+
\end{aligned}
179+
```
180+
181+
Here, `W` is the weight matrix, `b` is the bias vector, ``\eta`` is the learning rate, ``\frac{dL}{dW}`` is the derivative of the loss function with respect to the weight, and ``\frac{dL}{db}`` is the derivative of the loss function with respect to the bias.
182+
183+
The derivatives are usually calculated using an Automatic Differentiation tool, and `Flux` uses `Zygote.jl` for the same. Since `Zygote.jl` is an independent Julia package, it can be used outside of Flux as well! Refer to the documentation of `Zygote.jl` for more information on the same.
184+
185+
Our first step would be to obtain the gradient of the loss function with respect to the weights and the biases. `Flux` re-exports `Zygote`'s `gradient` function; hence, we don't need to import `Zygote` explicitly to use the functionality.
173186

174187
```jldoctest linear_regression_simple; filter = r"[+-]?([0-9]*[.])?[0-9]+"
175188
julia> dLdW, dLdb, _, _ = gradient(loss, W, b, x, y)
176189
(Float32[-6.7322206], Float32[-4.132563], Float32[0.1926041 0.14162663 … -0.39782608 -0.29997927], Float32[-0.16876957 -0.12410051 … 0.3485956 0.2628572])
190+
```
177191

192+
We can now update the parameters, following the gradient descent algorithm -
193+
194+
```jldoctest linear_regression; filter = r"[+-]?([0-9]*[.])?[0-9]+"
178195
julia> W .= W .- 0.1 .* dLdW
179196
1-element Vector{Float32}:
180197
1.8144473
@@ -184,18 +201,16 @@ julia> b .= b .- 0.1 .* dLdb
184201
0.41325632
185202
```
186203

187-
Now, we can move to the actual training! The training consists of obtaining the gradient and updating the current parameters with the obtained derivatives using backpropagation. This is achieved using `Flux.gradient` (see see [Taking Gradients](@ref)) and [`Flux.Optimise.update!`](@ref) functions respectively.
188-
189-
We can now check the values of our parameters and the value of the loss function -
204+
The parameters have been updated! We can now check the value of the loss function -
190205

191206
```jldoctest linear_regression_simple; filter = r"[+-]?([0-9]*[.])?[0-9]+"
192207
julia> loss(W, b, x, y)
193208
17.157953f0
194209
```
195210

196-
The parameters changed, and the loss went down! This means that we successfully trained our model for one epoch. We can plug the training code written above into a loop and train the model for a higher number of epochs. It can be customized either to have a fixed number of epochs or to stop when certain conditions are met, for example, `change in loss < 0.1`. This loop can be customized to suit a user's needs, and the conditions can be specified in plain `Julia`!
211+
The loss went down! This means that we successfully trained our model for one epoch. We can plug the training code written above into a loop and train the model for a higher number of epochs. It can be customized either to have a fixed number of epochs or to stop when certain conditions are met, for example, `change in loss < 0.1`. The loop can be tailored to suit the user's needs, and the conditions can be specified in plain `Julia`!
197212

198-
`Flux` also provides a convenience function to train a model. The [`Flux.train!`](@ref) function performs the same task described above and does not require calculating the gradient manually.
213+
Let's plug our super training logic inside a function and test it again -
199214

200215
```jldoctest linear_regression_simple; filter = r"[+-]?([0-9]*[.])?[0-9]+"
201216
julia> function train_model()
@@ -210,7 +225,7 @@ julia> W, b, loss(W, b, x, y)
210225
(Float32[2.340657], Float32[0.7516814], 13.64972f0)
211226
```
212227

213-
The parameters changed again, and the loss went down again! This was the second epoch of our training procedure. Let's plug this in a for loop and train the model for 60 epochs.
228+
It works, and the loss went down again! This was the second epoch of our training procedure. Let's plug this in a for loop and train the model for 30 epochs.
214229

215230
```jldoctest linear_regression_simple; filter = r"[+-]?([0-9]*[.])?[0-9]+"
216231
julia> for i = 1:30
@@ -221,7 +236,7 @@ julia> W, b, loss(W, b, x, y)
221236
(Float32[4.2408285], Float32[2.243728], 7.668049f0)
222237
```
223238

224-
The loss went down significantly!
239+
There was a significant reduction in loss, and the parameters were updated!
225240

226241
`Flux` provides yet another convenience functionality, the [`Flux.@epochs`](@ref) macro, which can be used to train a model for a specific number of epochs.
227242

@@ -242,7 +257,7 @@ julia> W, b, loss(W, b, x, y)
242257
(Float32[4.2422233], Float32[2.2460847], 7.6680417f0)
243258
```
244259

245-
We can train the model even more or tweak the hyperparameters to achieve the desired result faster, but let's stop here. We trained our model for 72 epochs, and loss went down from `22.74856` to `7.6680417f`. Time for some visualization!
260+
We can train the model even more or tweak the hyperparameters to achieve the desired result faster, but let's stop here. We trained our model for 42 epochs, and loss went down from `22.74856` to `7.6680417f`. Time for some visualization!
246261

247262
### Results
248263
The main objective of this tutorial was to fit a line to our dataset using the linear regression algorithm. The training procedure went well, and the loss went down significantly! Let's see what the fitted line looks like. Remember, `Wx + b` is nothing more than a line's equation, with `slope = W[1]` and `y-intercept = b[1]` (indexing at `1` as `W` and `b` are iterable).
@@ -260,7 +275,7 @@ julia> plot!((x) -> b[1] + W[1] * x, -3, 3, label="Custom model", lw=2);
260275
The line fits well! There is room for improvement, but we leave that up to you! You can play with the optimisers, the number of epochs, learning rate, etc. to improve the fitting and reduce the loss!
261276

262277
## Linear regression model on a real dataset
263-
We now move on to a relative;y complex linear regression model. Here we will use a real dataset from [`MLDatasets.jl`](https://github.com/JuliaML/MLDatasets.jl), which will not confine our data points to have only one feature. Let's start by importing the required packages -
278+
We now move on to a relatively complex linear regression model. Here we will use a real dataset from [`MLDatasets.jl`](https://github.com/JuliaML/MLDatasets.jl), which will not confine our data points to have only one feature. Let's start by importing the required packages -
264279

265280
```jldoctest linear_regression_complex
266281
julia> using Flux
@@ -336,7 +351,7 @@ julia> loss(model, x_train_n, y_train)
336351
We can now proceed to the training phase!
337352

338353
### Training
339-
Before training the model, let's initialize the optimiser and let `Flux` know that we want all the derivatives of all the parameters of our `model`.
354+
The training procedure would make use of the same mathematics, but now we can pass in the model inside the `gradient` call and let `Flux` and `Zygote` handle the derivatives!
340355

341356
```jldoctest linear_regression_complex
342357
julia> function train_model()
@@ -348,7 +363,7 @@ julia> function train_model()
348363

349364
Contrary to our last training procedure, let's say that this time we don't want to hardcode the number of epochs. We want the training procedure to stop when the loss converges, that is, when `change in loss < δ`. The quantity `δ` can be altered according to a user's need, but let's fix it to `10⁻³` for this tutorial.
350365

351-
We can write such custom training loops effortlessly using Flux and plain Julia!
366+
We can write such custom training loops effortlessly using `Flux` and plain `Julia`!
352367
```jldoctest linear_regression_complex
353368
julia> loss_init = Inf;
354369
@@ -393,11 +408,11 @@ The loss is not as small as the loss of the training data, but it looks good! Th
393408

394409
---
395410

396-
Summarising this tutorial, we started by generating a random yet correlated dataset for our custom model. We then saw how a simple linear regression model could be built with and without Flux, and how they were almost identical.
411+
Summarising this tutorial, we started by generating a random yet correlated dataset for our custom model. We then saw how a simple linear regression model could be built with and without `Flux`, and how they were almost identical.
397412

398-
Next, we trained the model by manually calling the gradient function and optimising the loss. We also saw how Flux provided various wrapper functionalities like the train! function to make the API simpler for users.
413+
Next, we trained the model by manually writing down the Gradient Descent algorithm and optimising the loss. We also saw how `Flux` provides various wrapper functionalities and keeps the API extremely intuitive and simple for the users.
399414

400-
After getting familiar with the basics of Flux and Julia, we moved ahead to build a machine learning model for a real dataset. We repeated the exact same steps, but this time with a lot more features and data points, and by harnessing Flux's full capabilities. In the end, we developed a training loop that was smarter than the hardcoded one and ran the model on our normalised dataset to conclude the tutorial.
415+
After getting familiar with the basics of `Flux` and `Julia`, we moved ahead to build a machine learning model for a real dataset. We repeated the exact same steps, but this time with a lot more features and data points, and by harnessing `Flux`'s full capabilities. In the end, we developed a training loop that was smarter than the hardcoded one and ran the model on our normalised dataset to conclude the tutorial.
401416

402417
## Copy-pastable code
403418
### Dummy dataset

0 commit comments

Comments
 (0)