Skip to content

Commit 0f895ca

Browse files
committed
Small changes in admonitions
1 parent aca5c93 commit 0f895ca

File tree

7 files changed

+102
-364
lines changed

7 files changed

+102
-364
lines changed

docs/src/lecture_11/glm.md

Lines changed: 8 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,12 @@ function plot_histogram(xs, f; kwargs...)
1919
end
2020
```
2121

22-
2322
# [Linear regression revisited](@id statistics)
2423

2524
This section revisits the linear regression. The classical statistical approach uses derives the same formulation for linear regression as the optimization approach. Besides point estimates for parameters, it also computes their confidence intervals and can test whether some parameters can be omitted from the model. We will start with hypothesis testing and then continue with regression.
2625

2726
Julia provides lots of statistical packages. They are summarized at the [JuliaStats](https://juliastats.org/) webpage. This section will give a brief introduction to many of them.
2827

29-
30-
31-
3228
## Theory of hypothesis testing
3329

3430
Hypothesis testing verifies whether data satisfy a given null hypothesis ``H_0``. Most of the tests need some assumptions about the data, such as normality. Under the validity of the null hypothesis, the test derives that a transformation of the data follows some distribution. Then it constructs a confidence interval of this distribution and checks whether the transformed variable lies in this confidence interval. If it lies outside of it, the test rejects the null hypothesis. In the opposite case, it fails to reject the null hypothesis. The latter is different from confirming the null hypothesis. Hypothesis testing is like a grumpy professor during exams. He never acknowledges that a student knows the topic sufficiently, but he is often clear that the student does not know it.
@@ -47,13 +43,8 @@ p = 2\min\{\mathbb P(T\le t \mid H_0), \mathbb P(T\ge t\mid H_0)\}
4743

4844
If the ``p``-value is smaller than a given threshold, usually ``5\%``, the null hypothesis is rejected. In the opposite case, it is not rejected. The ``p``-value is a measure of the probability that an observed difference could have occurred just by random chance.
4945

50-
51-
52-
5346
## Hypothesis testing
5447

55-
56-
5748
We first randomly generate data from the normal distribution with zero mean.
5849

5950
```@example glm
@@ -72,20 +63,18 @@ nothing # hide
7263

7364
The following exercise performs the ``t``-test to check whether the data come from a distribution with zero mean.
7465

75-
76-
77-
78-
7966
```@raw html
8067
<div class="admonition is-category-exercise">
8168
<header class="admonition-header">Exercise:</header>
8269
<div class="admonition-body">
8370
```
71+
8472
Use the ``t``-test to verify whether the samples were generated from a distribution with zero mean.
8573

86-
**Hint**: the Student's distribution is invoked by `TDist()`.
74+
**Hints:**
75+
- The Student's distribution is invoked by `TDist()`.
76+
- The probability ``\mathbb P(T\le t)`` equals to the [distribution function](https://en.wikipedia.org/wiki/Cumulative_distribution_function) ``F(t)``, which can be called by `cdf`.
8777

88-
**Hint**: the probability ``\mathbb P(T\le t)`` equals to the [distribution function](https://en.wikipedia.org/wiki/Cumulative_distribution_function) ``F(t)``, which can be called by `cdf`.
8978
```@raw html
9079
</div></div>
9180
<details class = "solution-body">
@@ -109,14 +98,6 @@ The ``p``-value is significantly larger than ``5\%``. Therefore, we cannot rejec
10998
</p></details>
11099
```
111100

112-
113-
114-
115-
116-
117-
118-
119-
120101
Even though the computation of the ``p``-value is simple, we can use the [HypothesisTests](https://juliastats.org/HypothesisTests.jl/stable/) package. When we run the test, it gives us the same results as we computed.
121102

122103
```@example glm
@@ -125,12 +106,6 @@ using HypothesisTests
125106
OneSampleTTest(xs)
126107
```
127108

128-
129-
130-
131-
132-
133-
134109
## Theory of generalized linear models
135110

136111
The statistical approach to linear regression is different from the one from machine learning. It also assumes a linear prediction function:
@@ -153,11 +128,6 @@ Since the density is the derivative of the distribution function, the term ``f(y
153128

154129
is often maximized. Since the logarithm is an increasing function, these two formulas are equivalent.
155130

156-
157-
158-
159-
160-
161131
#### Case 1: Linear regression
162132

163133
The first case considers ``g(z)=z`` to be the identity function and ``y\mid x`` with the [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) ``N(\mu_i, \sigma^2)``. Then
@@ -174,15 +144,12 @@ and, therefore, we need the solve the following optimization problem:
174144

175145
Since we maximize with respect to ``w``, most terms behave like constants, and this optimization problem is equivalent to
176146

177-
178147
```math
179148
\operatorname{minimize}_w\qquad \sum_{i=1}^n (y_i - w^\top x_i)^2.
180149
```
181150

182151
This is precisely linear regression as derived in the previous lectures.
183152

184-
185-
186153
#### Case 2: Logistic regression
187154

188155
The second case considers ``g(z)=\log z`` to be the logarithm function and ``y\mid x`` with the [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution) ``Po(\lambda)``. The inverse function to ``g`` is ``g^{-1}(z)=e^z``. Since the Poisson distribution has non-negative discrete values with probabilities ``\mathbb P(Y=k) = \frac{1}{k!}\lambda^ke^{-\lambda}``, labels ``y_i`` must also be non-negative integers. The same formula for the conditional expectation as before yields:
@@ -205,7 +172,6 @@ By using the formula for ``\lambda_i`` and getting rid of constants, we transfor
205172

206173
This function is similar to the one derived for logistic regression.
207174

208-
209175
## Linear models
210176

211177
We will use the [Employment and Wages in Spain](https://vincentarelbundock.github.io/Rdatasets/doc/plm/Snmesp.html) dataset because it is slightly larger than the iris dataset. It contains 5904 observations of wages from 738 companies in Spain from 1983 to 1990. We will estimate the dependence of wages on other factors such as employment or cash flow. We first load the dataset and transform the original log-wages into non-normalized wages. We use base ``2`` to obtain relatively small numbers.
@@ -241,22 +207,18 @@ model = lm(@formula(W ~ 1 + N + Y + I + K + F), wages)
241207

242208
The table shows the parameter values and their confidence intervals. Besides that, it also tests the null hypothesis ``H_0: w_j = 0`` whether some of the regression coefficients can be omitted. The ``t`` statistics is in column `t`, while its ``p``-value in column `Pr(>|t|)`. The next exercise checks whether we can achieve the same results with fewer features.
243209

244-
245-
246-
247-
248-
249-
250210
```@raw html
251211
<div class="admonition is-category-exercise">
252212
<header class="admonition-header">Exercise:</header>
253213
<div class="admonition-body">
254214
```
215+
255216
Check that the solution computed by hand and by `lm` are the same.
256217

257218
Then remove the feature with the highest ``p``-value and observe whether there was any performance drop. The performance is usually evaluated by the [coeffient of determination](https://en.wikipedia.org/wiki/Coefficient_of_determination) denoted by ``R^2\in[0,1]``. Its higher values indicate a better model.
258219

259220
**Hint**: Use functions `coef` and `r2`.
221+
260222
```@raw html
261223
</div></div>
262224
<details class = "solution-body">
@@ -287,13 +249,6 @@ Since we observe only a small performance drop, we could omit this feature witho
287249
</p></details>
288250
```
289251

290-
291-
292-
293-
294-
295-
296-
297252
The core assumption of this approach is that ``y`` follows the normal distribution. We use the `predict` function for predictions and then use the `plot_histogram` function written earlier to plot the histogram and a density of the normal distribution. For the normal distribution, we need to specify the correct mean and variance.
298253

299254
```@example glm
@@ -316,8 +271,6 @@ test_normality = ExactOneSampleKSTest(y_hat, Normal(mean(y_hat), std(y_hat)))
316271

317272
The result is expected. The ``p``-value is close to ``1\%``, which means that we reject the null hypothesis that the data follow the normal distribution even though it is not entirely far away.
318273

319-
320-
321274
## Generalized linear models
322275

323276
While the linear models do not transform the labels, the generalized models transform them by the link function. Moreover, they allow choosing other than the normal distribution for labels. Therefore, we need to specify the link function ``g`` and the distribution of ``y \mid x``.
@@ -328,21 +281,16 @@ We repeat the same example with the link function ``g(z) = \sqrt{z}`` and the [i
328281
model = glm(@formula(W ~ 1 + N + Y + I + K + F), wages, InverseGaussian(), SqrtLink())
329282
```
330283

331-
332-
333-
334-
335-
336-
337-
338284
The following exercise plots the predictions for the generalized linear model.
339285

340286
```@raw html
341287
<div class="admonition is-category-exercise">
342288
<header class="admonition-header">Exercise:</header>
343289
<div class="admonition-body">
344290
```
291+
345292
Create the scatter plot of predictions and labels. Do not use the `predict` function.
293+
346294
```@raw html
347295
</div></div>
348296
<details class = "solution-body">
@@ -371,7 +319,6 @@ scatter(y, y_hat;
371319
savefig("glm_predict.svg")
372320
```
373321

374-
375322
```@raw html
376323
</p></details>
377324
```

0 commit comments

Comments
 (0)