Skip to content

Commit 29a019d

Browse files
committed
Lecture 11: Polishing touches
1 parent 08ee7cd commit 29a019d

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

docs/src/lecture_11/sparse.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Often, a regularization term is added. There are two possibilities. The [ridge r
1515
[LASSO](https://en.wikipedia.org/wiki/Lasso_(statistics)) adds the weighted ``l_1``-norm penalization term to the objective:
1616

1717
```math
18-
\operatorname{minimize}_w\qquad \sum_{i=1}^n(w^\top x_i - y_i)^2 + \mu \|w|\|_1.
18+
\operatorname{minimize}_w\qquad \sum_{i=1}^n(w^\top x_i - y_i)^2 + \mu \|w\|_1.
1919
```
2020

2121
Both approaches try to keep the norm of parameters ``w`` small to prevent overfitting. The first approach results in a simpler numerical method, while the second one induces sparsity. Before we start with both topics, we will briefly mention matrix decompositions which plays a crucial part in numerical computations.
@@ -27,7 +27,7 @@ Both approaches try to keep the norm of parameters ``w`` small to prevent overfi
2727
Consider a square matrix ``A\in \mathbb R^{n\times n}`` with real-valued entries. We there exist ``\lambda\in\mathbb R`` and ``v\in\mathbb^n`` such that
2828

2929
```math
30-
Av = \lambda b,
30+
Av = \lambda v,
3131
```
3232

3333
we say that ``\lambda`` is a eigenvalue of ``A`` and ``v`` is the corresponding eigenvector.
@@ -41,13 +41,13 @@ A = Q\Lambda Q^\top
4141
and for any real number ``\mu``, we also have
4242

4343
```math
44-
A + \mu I = Q(\Lambda + \mu I) Q^\top
44+
A + \mu I = Q(\Lambda + \mu I) Q^\top.
4545
```
4646

4747
Since the eigenvectors are perpendicular, ``Q`` is an orthonormal matrix and therefore ``Q^{-1} = Q^\top``. This implies that we can easily invert the matrix ``A + \mu I`` by
4848

4949
```math
50-
(A + \mu I)^{-1} = Q^\top (\Lambda + \mu I)^{-1} Q.
50+
(A + \mu I)^{-1} = Q (\Lambda + \mu I)^{-1} Q^\top.
5151
```
5252

5353
Because ``\Lambda + \mu I`` is a diagonal matrix, its inverse is simple to compute.
@@ -78,7 +78,7 @@ X^\top X = Q\Lambda Q^\top.
7878
Then the formula for optimal weights simplifies into
7979

8080
```math
81-
w = Q^\top (\Lambda+\mu I)^{-1} QX^\top y.
81+
w = Q(\Lambda+\mu I)^{-1} Q^\top X^\top y.
8282
```
8383

8484
Since this formula uses only matrix-vector multiplication and an inversion of a diagonal matrix, we can employ it to fast compute the solution for multiple values of ``\mu``.

0 commit comments

Comments
 (0)