You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/getting_started/linear_regression.md
+15-14Lines changed: 15 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -270,20 +270,22 @@ julia> using MLDatasets: BostonHousing
270
270
### Data
271
271
Let's start by initializing our dataset. We will be using the [`BostonHousing`](https://juliaml.github.io/MLDatasets.jl/stable/datasets/misc/#MLDatasets.BostonHousing) dataset consisting of `506` data points. Each of these data points has `13` features and a corresponding label, the house's price. The `x`s are still mapped to a single `y`, but now, a single `x` data point has 13 features.
272
272
273
-
```julia linear_regression_complex
273
+
```jldoctest linear_regression_complex
274
+
julia> using DataFrames
275
+
274
276
julia> dataset = BostonHousing()
275
277
dataset BostonHousing:
276
278
metadata => Dict{String, Any} with 5 entries
277
279
features => 506×13 DataFrame
278
280
targets => 506×1 DataFrame
279
281
dataframe => 506×14 DataFrame
280
282
281
-
julia> x, y =BostonHousing(as_df=false)[:]
283
+
julia> x, y = BostonHousing(as_df=false)[:];
282
284
```
283
285
284
286
We can now split the obtained data into training and testing data -
This data contains a diverse number of features, which means that the features have different scales. A wise option here would be to `normalise` the data, making the training process more efficient and fast. Let's check the standard deviation of the training data before normalising it.
The standard deviation is now close to one! The last step for this section would be to wrap the `x`s and `y`s together to create the training data.
310
312
311
-
```julia linear_regression_complex
313
+
```jldoctest linear_regression_complex
312
314
julia> train_data = [(x_train_n, y_train)];
313
315
```
314
316
@@ -317,14 +319,14 @@ Our data is ready!
317
319
### Model
318
320
We can now directly use `Flux` and let it do all the work internally! Let's define a model that takes in 13 inputs (13 features) and gives us a single output (the label). We will then pass our entire data through this model in one go, and `Flux` will handle everything for us! Remember, we could have declared a model in plain `Julia` as well. The model will have 14 parameters, 13 weights, and one bias.
319
321
320
-
```julia linear_regression_complex
322
+
```jldoctest linear_regression_complex
321
323
julia> model = Dense(13 => 1)
322
324
Dense(13 => 1) # 14 parameters
323
325
```
324
326
325
327
Same as before, our next step would be to define a loss function to quantify our accuracy somehow. The lower the loss, the better the model!
Contrary to our last training procedure, let's say that this time we don't want to hardcode the number of epochs. We want the training procedure to stop when the loss converges, that is, when `change in loss < δ`. The quantity `δ` can be altered according to a user's need, but let's fix it to `10⁻³` for this tutorial.
349
351
350
352
We can write such custom training loops effortlessly using Flux and plain Julia!
351
-
```julia linear_regression_complex
353
+
```jldoctest linear_regression_complex
352
354
julia> loss_init = Inf;
353
355
354
356
julia> while true
355
-
Flux.train!(loss, params, data, opt)
357
+
Flux.train!(loss, params, train_data, opt)
356
358
if loss_init == Inf
357
359
loss_init = loss(x_train_n, y_train)
358
360
continue
359
361
end
360
-
361
362
if abs(loss_init - loss(x_train_n, y_train)) < 1e-3
362
363
break
363
364
else
@@ -372,7 +373,7 @@ This custom loop works! This shows how easily a user can write down any custom t
@@ -382,7 +383,7 @@ The loss went down significantly! It can be minimized further by choosing an eve
382
383
### Testing
383
384
The last step of this tutorial would be to test our model using the testing data. We will first normalise the testing data and then calculate the corresponding loss.
0 commit comments