Fix doctests

Saransh-cpp · Saransh-cpp · commit f3f693331e99 · 2022-07-15T03:02:13.000+05:30
diff --git a/docs/src/getting_started/linear_regression.md b/docs/src/getting_started/linear_regression.md
@@ -191,7 +191,7 @@ julia> dLdW, dLdb, _, _ = gradient(loss, W, b, x, y)
 
 We can now update the parameters, following the gradient descent algorithm -
 
-```jldoctest linear_regression; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+```jldoctest linear_regression_simple; filter = r"[+-]?([0-9]*[.])?[0-9]+"
 julia> W .= W .- 0.1 .* dLdW
 1-element Vector{Float32}:
  1.8144473
diff --git a/docs/src/getting_started/logistic_regression.md b/docs/src/getting_started/logistic_regression.md
@@ -0,0 +1,262 @@
+# Logistic Regression
+
+The following page contains a step-by-step walkthrough of the logistic regression algorithm in `Julia` using `Flux`! We will be creating a simple logistic regression model without any usage of `Flux` and then would compare the different working parts with `Flux`'s implementation. 
+
+Let's start by importing the required `Julia` packages!
+
+```julia logistic_regression
+julia> using Flux
+
+julia> using Statistics
+
+julia> using MLDatasets
+
+julia> using DataFrames
+```
+
+## Dataset
+Let's start by importing a dataset from `MLDatasets.jl`! We will be using the `Iris` dataset that contains the data of three different `Iris` species. The data consists of 150 data points (`x`s), each having 4 features. Each of this `x` is mapped to `y`, the name of a particular `Iris` specie (a class or a label).
+
+```julia logistic_regression
+julia> Iris()
+dataset Iris:
+  metadata    =>    Dict{String, Any} with 4 entries
+  features    =>    150×4 DataFrame
+  targets     =>    150×1 DataFrame
+  dataframe   =>    150×5 DataFrame
+
+julia> x, y = Iris(as_df=false)[:]
+(features = [5.1 4.9 … 6.2 5.9; 3.5 3.0 … 3.4 3.0; 1.4 1.4 … 5.4 5.1; 0.2 0.2 … 2.3 1.8], targets = InlineStrings.String15["Iris-setosa" "Iris-setosa" … "Iris-virginica" "Iris-virginica"])
+```
+
+Our next step would be to convert this data in a form that can be fed to a machine learning model. The `x` values are arranged in a matrix and thus don't need any alterations but the labels must be one hot encoded. [Here](https://discourse.julialang.org/t/all-the-ways-to-do-one-hot-encoding/64807) is a great discourse thread on different techniques that can be used to one hot encode a data with or without using any external `Julia` package.
+
+```julia logistic_regression
+julia> y_r = reshape(y, (150, 1));
+
+julia> custom_y_onehot = unique(y_r) .== permutedims(y_r)
+3×150 BitMatrix:
+ 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  …  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    
+ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    
+ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0     1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
+```
+
+This same operation can also be performed using `Flux`'s `onehotbatch` function! We will use both of these outputs parallely to show how intuitive `Flux` is!
+
+```julia logistic_regression
+julia> flux_y_onehot = Flux.onehotbatch(y_r, ["Iris-setosa", "Iris-virginica", "Iris-versicolor"])
+3×150×1 OneHotArray(::Matrix{UInt32}) with eltype Bool:
+[:, :, 1] =
+ 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  …  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
+ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0     1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
+ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
+```
+
+Our data is ready! The next step would be to build a classifier for the same.
+
+## Building a model
+
+A logistic regression model is mathematically defined as -
+
+```math
+model(x) = σ(Wx + b)
+```
+
+where `W` is the weight matrix, `b` is the bias vector, and `σ` is any activation function. For our case, let's use the `softmax` activation function as we will be performing a multiclass classifictation task. We can define our model in `Julia` using the exact same notation!
+
+```julia logistic_regression
+julia> m(x) = W*x .+ b
+m (generic function with 1 method)
+```
+
+Note that this model lacks an activation function right now, but we will come back to that.
+
+We can now move ahead to initalize the parameters of our model. TO keep it simple, let's use `Julia`'s `rand` function to initialize the weights, and let's initialize the biases as `0`. Given that our model has 4 inputs (4 features in every data point), and 3 outputs (3 different classes), the parameters can be initialized in the following way -
+
+```julia logistic_regression; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+julia> W = rand(Float32, 3, 4)
+3×4 Matrix{Float32}:
+ 0.660353  0.474309  0.170792  0.239653
+ 0.790627  0.15147   0.707435  0.923513
+ 0.3684    0.20105   0.399129  0.17404
+
+julia> b = [0.0f0, 0.0f0, 0.0f0]
+3-element Vector{Float32}:
+ 0.0
+ 0.0
+ 0.0
+```
+
+Now our model is capable of taking in the complete data, and predicting the class of each `x` in one go! But, we need to make sure that our model outputs the probability of an input belonging to a particular class. As our model as 3 outputs, each one of them would denote the probability of the input belonging to that particular class.
+
+To map our outputs to a probability value, we will use an activation function. It would make sense to use a `softmax` activation function here, which is mathematically described as -
+
+```math
+σ(\vec{x}) = \frac{\\e^{z_{i}}}{\\sum_{j=1}^{k} \\e^{z_{j}}}
+```
+
+The `softmax` function scales down the outputs to probability values such that the sum of all the final outputs is equal to `1`. Let's implement this in `Julia`!
+
+```julia logistic_regression
+julia> custom_softmax(x) = exp.(x) ./ sum(exp.(x), dims=1)
+custom_softmax (generic function with 1 method)
+```
+
+The implementation looks straight forward enough! Note that we specify `dims=1` in the `sum` function to calculate the sum of probabilities across columns. Remember, we will have a `3X150` matrix (predicted `y`s) as the output of our model, where each column would be mapped to a column in the `x` matrix. Now, we will have to take the sum of the probability values across columns (that is, each output value) for the activation function to work; hence `dims=1`.
+
+Let's combine this `softmax` function with our model to construct the complete `custom_model`.
+
+```julia logistic_regression
+julia> custom_model(x) = m(x) |> custom_softmax
+custom_model (generic function with 1 method)
+```
+
+Let's check if our model works.
+
+```julia logistic_regression
+julia> custom_model(x) |> size
+(3, 150)
+```
+
+It works! Let's check if the `softmax` function is working as it should.
+
+```julia logistic_regression
+julia> all(custom_model(x) .< 1.0f0 .&& custom_model(x) .> 0.0f0)
+true
+
+julia> sum(custom_model(x), dims=1)
+1×150 Matrix{Float64}:
+ 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
+```
+
+Every output value is between `0` and `1`, and every column adds to `1`!
+
+`Flux` provides the users with a very simple API which almost feels like writing your own code! Let's convert our `custom_model` to a `Flux` model.
+
+```julia logistic_regression
+julia> flux_model = Dense(4 => 1) |> softmax
+Chain(
+  Dense(4 => 3),                        # 5 parameters
+  NNlib.softmax,
+)
+```
+
+A [`Dense(4 => 3)`](@ref Dense) layer denotes a layer with four inputs (four features in every data point) and three outputs (three classes or labels). This layer is exactly same as the mathematical model defined by us above! Under the hood, `Flux` too calculates the output using the same expression! But, we don't have to initialize the parameters ourselves this time, instead `Flux` does it for us.
+
+```julia linear_regression_simple; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+julia> flux_model.weight, flux_model.bias
+(Float32[1.0764818], Float32[0.0])
+```
+
+Now we can check if our model is acting right. We can pass the complete data in one go, with each data point having four features (four inputs) -
+
+```julia linear_regression_simple; filter = r"[+-]?([0-9]*[.])?[0-9]+"
+julia> flux_model(x) |> size
+(1, 61)
+
+julia> flux_model(x)[1], y[1]
+(-1.7474315f0, -7.0f0)
+```
+
+## Loss and accuracy
+```julia
+julia> custom_logitcrossentropy(ŷ, y) = mean(.-sum(y .* logsoftmax(ŷ; dims = 1); dims = 1))
+custom_logitcrossentropy (generic function with 1 method)
+
+julia> function custom_loss(x, y)
+           ŷ = custom_model(x)
+           custom_logitcrossentropy(ŷ, y)
+       end
+custom_loss (generic function with 1 method)
+
+julia> custom_loss(x, y_onehot)
+1.0606989738802028
+```
+```julia
+julia> function loss(x, y)
+           ŷ = model(x)
+           Flux.logitcrossentropy(ŷ, y)
+       end
+loss (generic function with 1 method)
+
+julia> loss(x, y_onehot)
+1.092375933564367
+```
+
+```julia
+julia> findmax(y_onehot, dims=1)
+([1 1 … 1 1;;;], [CartesianIndex(1, 1, 1) CartesianIndex(1, 2, 1) … CartesianIndex(2, 149, 1) CartesianIndex(2, 150, 1);;;])
+
+julia> mxidx = findmax(y_onehot, dims=1)[2]
+1×150×1 Array{CartesianIndex{3}, 3}:
+[:, :, 1] =
+ CartesianIndex(1, 1, 1)  CartesianIndex(1, 2, 1)  CartesianIndex(1, 3, 1)  CartesianIndex(1, 4, 1)  …  CartesianIndex(2, 148, 1)  CartesianIndex(2, 149, 1)  CartesianIndex(2, 150, 1)
+
+julia> mxidx[1]
+CartesianIndex(1, 1, 1)
+
+julia> mxidx[1].I
+(1, 1, 1)
+
+julia> mxidx[1].I[1]
+1
+
+julia> y_cold = Vector{String}(undef, 150);
+
+julia> for i = 1:150
+           if mxidx[i].I[1] == 1
+               y_cold[i] = "Iris-setosa"
+           elseif mxidx[i].I[1] == 2
+               y_cold[i] = "Iris-virginica"
+           elseif mxidx[i].I[1] == 3
+               y_cold[i] = "Iris-versicolor"
+           end
+       end
+
+julia> istrue = Flux.onecold(y_onehot, ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]) .== y_cold;
+
+julia> all(istrue)
+true
+```
+```julia
+julia> custom_accuracy(x, y) = mean(Flux.onecold(custom_model(x), ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]) .== Flux.onecold(custom_y_onehot, ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]))
+
+julia> custom_accuracy(x, y)
+0.3333333333333333
+```
+```julia
+julia> accuracy(x, y) = mean(Flux.onecold(model(x), ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]) .== Flux.onecold(y_onehot, ["Iris-setosa", "Iris-virginica", "Iris-versicolor"]))
+accuracy (generic function with 1 method)
+
+julia> accuracy(x, y)
+0.3333333333333333
+```
+
+## Training the model
+```julia
+julia> opt = Descent(0.1)
+Descent(0.1)
+
+julia> params = Flux.params(W, b)
+Params([Float32[0.6603528 0.47430867 0.17079216 0.23965251; 0.7906274 0.15146977 0.7074347 0.92351294; 0.3684004 0.20104975 0.39912927 0.17404026], Float32[0.0, 0.0, 0.0]])
+
+julia> for i = 1:100
+            Flux.train!(custom_loss, params, [(x, custom_y_onehot)], opt)
+            @show custom_accuracy(x, y)
+       end
+```
+
+```julia
+julia> params = Flux.params(model)
+Params([Float32[0.55286723 -0.030403392 0.41436023 -0.2771595; -0.09287064 0.38187975 0.42391905 0.037785027; 0.14706837 0.29528287 0.2445691 0.3731384], Float32[0.0, 0.0, 0.0]])
+```
+
+```julia
+julia> for i = 1:100
+           Flux.train!(loss, params, [(x, y_onehot)], opt)
+           if accuracy(x, y) >= 0.98 break end
+       end
+    
+julia> @show accuracy(x, y)
+accuracy(x, y) = 0.98
+```
diff --git a/xy.jld2 b/xy.jld2