Skip to content

Commit f3f6933

Browse files
committed
Fix doctests
1 parent 1ef9a96 commit f3f6933

File tree

3 files changed

+263
-1
lines changed

3 files changed

+263
-1
lines changed

docs/src/getting_started/linear_regression.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ julia> dLdW, dLdb, _, _ = gradient(loss, W, b, x, y)
191191

192192
We can now update the parameters, following the gradient descent algorithm -
193193

194-
```jldoctest linear_regression; filter = r"[+-]?([0-9]*[.])?[0-9]+"
194+
```jldoctest linear_regression_simple; filter = r"[+-]?([0-9]*[.])?[0-9]+"
195195
julia> W .= W .- 0.1 .* dLdW
196196
1-element Vector{Float32}:
197197
1.8144473
Lines changed: 262 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,262 @@
1+
# Logistic Regression
2+
3+
The following page contains a step-by-step walkthrough of the logistic regression algorithm in `Julia` using `Flux`! We will be creating a simple logistic regression model without any usage of `Flux` and then would compare the different working parts with `Flux`'s implementation.
4+
5+
Let's start by importing the required `Julia` packages!
6+
7+
```julia logistic_regression
8+
julia> using Flux
9+
10+
julia> using Statistics
11+
12+
julia> using MLDatasets
13+
14+
julia> using DataFrames
15+
```
16+
17+
## Dataset
18+
Let's start by importing a dataset from `MLDatasets.jl`! We will be using the `Iris` dataset that contains the data of three different `Iris` species. The data consists of 150 data points (`x`s), each having 4 features. Each of this `x` is mapped to `y`, the name of a particular `Iris` specie (a class or a label).
19+
20+
```julia logistic_regression
21+
julia> Iris()
22+
dataset Iris:
23+
metadata => Dict{String, Any} with 4 entries
24+
features => 150×4 DataFrame
25+
targets => 150×1 DataFrame
26+
dataframe => 150×5 DataFrame
27+
28+
julia> x, y = Iris(as_df=false)[:]
29+
(features = [5.1 4.9 6.2 5.9; 3.5 3.0 3.4 3.0; 1.4 1.4 5.4 5.1; 0.2 0.2 2.3 1.8], targets = InlineStrings.String15["Iris-setosa" "Iris-setosa" "Iris-virginica" "Iris-virginica"])
30+
```
31+
32+
Our next step would be to convert this data in a form that can be fed to a machine learning model. The `x` values are arranged in a matrix and thus don't need any alterations but the labels must be one hot encoded. [Here](https://discourse.julialang.org/t/all-the-ways-to-do-one-hot-encoding/64807) is a great discourse thread on different techniques that can be used to one hot encode a data with or without using any external `Julia` package.
33+
34+
```julia logistic_regression
35+
julia> y_r = reshape(y, (150, 1));
36+
37+
julia> custom_y_onehot = unique(y_r) .== permutedims(y_r)
38+
3×150 BitMatrix:
39+
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
40+
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
41+
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
42+
```
43+
44+
This same operation can also be performed using `Flux`'s `onehotbatch` function! We will use both of these outputs parallely to show how intuitive `Flux` is!
45+
46+
```julia logistic_regression
47+
julia> flux_y_onehot = Flux.onehotbatch(y_r, ["Iris-setosa", "Iris-virginica", "Iris-versicolor"])
48+
3×150×1 OneHotArray(::Matrix{UInt32}) with eltype Bool:
49+
[:, :, 1] =
50+
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
51+
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
52+
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
53+
```
54+
55+
Our data is ready! The next step would be to build a classifier for the same.
56+
57+
## Building a model
58+
59+
A logistic regression model is mathematically defined as -
60+
61+
```math
62+
model(x) = σ(Wx + b)
63+
```
64+
65+
where `W` is the weight matrix, `b` is the bias vector, and `σ` is any activation function. For our case, let's use the `softmax` activation function as we will be performing a multiclass classifictation task. We can define our model in `Julia` using the exact same notation!
66+
67+
```julia logistic_regression
68+
julia> m(x) = W*x .+ b
69+
m (generic function with 1 method)
70+
```
71+
72+
Note that this model lacks an activation function right now, but we will come back to that.
73+
74+
We can now move ahead to initalize the parameters of our model. TO keep it simple, let's use `Julia`'s `rand` function to initialize the weights, and let's initialize the biases as `0`. Given that our model has 4 inputs (4 features in every data point), and 3 outputs (3 different classes), the parameters can be initialized in the following way -
75+
76+
```julia logistic_regression; filter = r"[+-]?([0-9]*[.])?[0-9]+"
77+
julia> W = rand(Float32, 3, 4)
78+
3×4 Matrix{Float32}:
79+
0.660353 0.474309 0.170792 0.239653
80+
0.790627 0.15147 0.707435 0.923513
81+
0.3684 0.20105 0.399129 0.17404
82+
83+
julia> b = [0.0f0, 0.0f0, 0.0f0]
84+
3-element Vector{Float32}:
85+
0.0
86+
0.0
87+
0.0
88+
```
89+
90+
Now our model is capable of taking in the complete data, and predicting the class of each `x` in one go! But, we need to make sure that our model outputs the probability of an input belonging to a particular class. As our model as 3 outputs, each one of them would denote the probability of the input belonging to that particular class.
91+
92+
To map our outputs to a probability value, we will use an activation function. It would make sense to use a `softmax` activation function here, which is mathematically described as -
93+
94+
```math
95+
σ(\vec{x}) = \frac{\\e^{z_{i}}}{\\sum_{j=1}^{k} \\e^{z_{j}}}
96+
```
97+
98+
The `softmax` function scales down the outputs to probability values such that the sum of all the final outputs is equal to `1`. Let's implement this in `Julia`!
99+
100+
```julia logistic_regression
101+
julia> custom_softmax(x) = exp.(x) ./ sum(exp.(x), dims=1)
102+
custom_softmax (generic function with 1 method)
103+
```
104+
105+
The implementation looks straight forward enough! Note that we specify `dims=1` in the `sum` function to calculate the sum of probabilities across columns. Remember, we will have a `3X150` matrix (predicted `y`s) as the output of our model, where each column would be mapped to a column in the `x` matrix. Now, we will have to take the sum of the probability values across columns (that is, each output value) for the activation function to work; hence `dims=1`.
106+
107+
Let's combine this `softmax` function with our model to construct the complete `custom_model`.
108+
109+
```julia logistic_regression
110+
julia> custom_model(x) = m(x) |> custom_softmax
111+
custom_model (generic function with 1 method)
112+
```
113+
114+
Let's check if our model works.
115+
116+
```julia logistic_regression
117+
julia> custom_model(x) |> size
118+
(3, 150)
119+
```
120+
121+
It works! Let's check if the `softmax` function is working as it should.
122+
123+
```julia logistic_regression
124+
julia> all(custom_model(x) .< 1.0f0 .&& custom_model(x) .> 0.0f0)
125+
true
126+
127+
julia> sum(custom_model(x), dims=1)
128+
1×150 Matrix{Float64}:
129+
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
130+
```
131+
132+
Every output value is between `0` and `1`, and every column adds to `1`!
133+
134+
`Flux` provides the users with a very simple API which almost feels like writing your own code! Let's convert our `custom_model` to a `Flux` model.
135+
136+
```julia logistic_regression
137+
julia> flux_model = Dense(4 => 1) |> softmax
138+
Chain(
139+
Dense(4 => 3), # 5 parameters
140+
NNlib.softmax,
141+
)
142+
```
143+
144+
A [`Dense(4 => 3)`](@ref Dense) layer denotes a layer with four inputs (four features in every data point) and three outputs (three classes or labels). This layer is exactly same as the mathematical model defined by us above! Under the hood, `Flux` too calculates the output using the same expression! But, we don't have to initialize the parameters ourselves this time, instead `Flux` does it for us.
145+
146+
```julia linear_regression_simple; filter = r"[+-]?([0-9]*[.])?[0-9]+"
147+
julia> flux_model.weight, flux_model.bias
148+
(Float32[1.0764818], Float32[0.0])
149+
```
150+
151+
Now we can check if our model is acting right. We can pass the complete data in one go, with each data point having four features (four inputs) -
152+
153+
```julia linear_regression_simple; filter = r"[+-]?([0-9]*[.])?[0-9]+"
154+
julia> flux_model(x) |> size
155+
(1, 61)
156+
157+
julia> flux_model(x)[1], y[1]
158+
(-1.7474315f0, -7.0f0)
159+
```
160+
161+
## Loss and accuracy
162+
```julia
163+
julia> custom_logitcrossentropy(ŷ, y) = mean(.-sum(y .* logsoftmax(ŷ; dims = 1); dims = 1))
164+
custom_logitcrossentropy (generic function with 1 method)
165+
166+
julia> function custom_loss(x, y)
167+
= custom_model(x)
168+
custom_logitcrossentropy(ŷ, y)
169+
end
170+
custom_loss (generic function with 1 method)
171+
172+
julia> custom_loss(x, y_onehot)
173+
1.0606989738802028
174+
```
175+
```julia
176+
julia> function loss(x, y)
177+
= model(x)
178+
Flux.logitcrossentropy(ŷ, y)
179+
end
180+
loss (generic function with 1 method)
181+
182+
julia> loss(x, y_onehot)
183+
1.092375933564367
184+
```
185+
186+
```julia
187+
julia> findmax(y_onehot, dims=1)
188+
([1 1 1 1;;;], [CartesianIndex(1, 1, 1) CartesianIndex(1, 2, 1) CartesianIndex(2, 149, 1) CartesianIndex(2, 150, 1);;;])
189+
190+
julia> mxidx = findmax(y_onehot, dims=1)[2]
191+
1×150×1 Array{CartesianIndex{3}, 3}:
192+
[:, :, 1] =
193+
CartesianIndex(1, 1, 1) CartesianIndex(1, 2, 1) CartesianIndex(1, 3, 1) CartesianIndex(1, 4, 1) CartesianIndex(2, 148, 1) CartesianIndex(2, 149, 1) CartesianIndex(2, 150, 1)
194+
195+
julia> mxidx[1]
196+
CartesianIndex(1, 1, 1)
197+
198+
julia> mxidx[1].I
199+
(1, 1, 1)
200+
201+
julia> mxidx[1].I[1]
202+
1
203+
204+
julia> y_cold = Vector{String}(undef, 150);
205+
206+
julia> for i = 1:150
207+
if mxidx[i].I[1] == 1
208+
y_cold[i] = "Iris-setosa"
209+
elseif mxidx[i].I[1] == 2
210+
y_cold[i] = "Iris-virginica"
211+
elseif mxidx[i].I[1] == 3
212+
y_cold[i] = "Iris-versicolor"
213+
end
214+
end
215+
216+
julia> istrue = Flux.onecold(y_onehot, ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]) .== y_cold;
217+
218+
julia> all(istrue)
219+
true
220+
```
221+
```julia
222+
julia> custom_accuracy(x, y) = mean(Flux.onecold(custom_model(x), ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]) .== Flux.onecold(custom_y_onehot, ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]))
223+
224+
julia> custom_accuracy(x, y)
225+
0.3333333333333333
226+
```
227+
```julia
228+
julia> accuracy(x, y) = mean(Flux.onecold(model(x), ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]) .== Flux.onecold(y_onehot, ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]))
229+
accuracy (generic function with 1 method)
230+
231+
julia> accuracy(x, y)
232+
0.3333333333333333
233+
```
234+
235+
## Training the model
236+
```julia
237+
julia> opt = Descent(0.1)
238+
Descent(0.1)
239+
240+
julia> params = Flux.params(W, b)
241+
Params([Float32[0.6603528 0.47430867 0.17079216 0.23965251; 0.7906274 0.15146977 0.7074347 0.92351294; 0.3684004 0.20104975 0.39912927 0.17404026], Float32[0.0, 0.0, 0.0]])
242+
243+
julia> for i = 1:100
244+
Flux.train!(custom_loss, params, [(x, custom_y_onehot)], opt)
245+
@show custom_accuracy(x, y)
246+
end
247+
```
248+
249+
```julia
250+
julia> params = Flux.params(model)
251+
Params([Float32[0.55286723 -0.030403392 0.41436023 -0.2771595; -0.09287064 0.38187975 0.42391905 0.037785027; 0.14706837 0.29528287 0.2445691 0.3731384], Float32[0.0, 0.0, 0.0]])
252+
```
253+
254+
```julia
255+
julia> for i = 1:100
256+
Flux.train!(loss, params, [(x, y_onehot)], opt)
257+
if accuracy(x, y) >= 0.98 break end
258+
end
259+
260+
julia> @show accuracy(x, y)
261+
accuracy(x, y) = 0.98
262+
```

xy.jld2

1.31 KB
Binary file not shown.

0 commit comments

Comments
 (0)