Skip to content

Commit d838ad0

Browse files
committed
Add a book chapter
1 parent ae00464 commit d838ad0

File tree

3 files changed

+122
-3
lines changed

3 files changed

+122
-3
lines changed

Justfile

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,8 @@ fmt:
5353
fmt-check:
5454
find crates -type f \( -name '*.cpp' -o -name '*.hpp' \) -exec clang-format --dry-run --Werror {} +
5555

56+
# -------------------------
57+
# Book
58+
# -------------------------
59+
serve-book:
60+
mdbook serve book

book/src/SUMMARY.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
11
- [Introduction](introduction.md)
22
- [Kalman filter](kf_linear.md)
3-
- [Class definition]()
4-
- [Class implementation]()
5-
- [Python bindings]()
3+
- [Simple optimizers](simple_optimizers.md)

book/src/simple_optimizers.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# Optimizers
2+
3+
This chapter documents the small optimization module used in the project: a minimal runtime‑polymorphic interface `Optimizer` with two concrete implementations, Gradient Descent and Momentum. It is designed for clarity and easy swapping of algorithms in training loops.
4+
5+
6+
## Problem setting
7+
8+
Given parameters $\mathbf{w}\in\mathbb{R}^d$ and a loss $\mathcal{L}(\mathbf{w})$, an optimizer updates weights using the gradient
9+
$$
10+
\mathbf{g}_t=\nabla_{\mathbf{w}}\mathcal{L}(\mathbf{w}_t).
11+
$$
12+
Each algorithm defines an update rule $\mathbf{w}_{t+1} = \Phi(\mathbf{w}_t,\mathbf{g}_t,\theta)$ with hyper‑parameters $\theta$ (e.g., learning rate, momentum).
13+
14+
15+
## API overview
16+
17+
<details>
18+
<summary>Click here to view the full implementation: <b>include/cppx/opt/optimizers.hpp</b>. We break into down in the sequel of this section. </summary>
19+
20+
```cpp
21+
{{#include ../../crates/simple_optimizers/include/optimizers.hpp}}
22+
```
23+
</details>
24+
25+
Design choices
26+
- A small virtual interface to enable swapping algorithms at runtime.
27+
- `std::unique_ptr<Optimizer>` for owning polymorphism; borrowing functions accept `Optimizer&`.
28+
- Exceptions (`std::invalid_argument`) signal size mismatches.
29+
30+
31+
## Gradient descent
32+
33+
Update rule
34+
$$
35+
\mathbf{w}_{t+1}=\mathbf{w}_{t}-\eta\,\mathbf{g}_t ,
36+
$$
37+
with learning rate $\eta>0$.
38+
39+
Implementation
40+
```cpp
41+
void GradientDescent::step(std::vector<double>& w,
42+
const std::vector<double>& g) {
43+
if (w.size() != g.size()) throw std::invalid_argument("size mismatch");
44+
for (std::size_t i = 0; i < w.size(); ++i) {
45+
w[i] -= lr_ * g[i];
46+
}
47+
}
48+
```
49+
50+
## Momentum-based gradient descent
51+
52+
Update rule
53+
$$
54+
\begin{aligned}
55+
\mathbf{v}_{t+1} &= \mu\,\mathbf{v}_{t} + \eta\,\mathbf{g}_t, \\\\
56+
\mathbf{w}_{t+1} &= \mathbf{w}_{t} - \mathbf{v}_{t+1},
57+
\end{aligned}
58+
$$
59+
with momentum $\mu\in[0,1)$ and learning rate $\eta>0$.
60+
61+
Implementation
62+
```cpp
63+
Momentum::Momentum(double learning_rate, double momentum, std::size_t dim)
64+
: lr_(learning_rate), mu_(momentum), v_(dim, 0.0) {}
65+
66+
void Momentum::step(std::vector<double>& w, const std::vector<double>& g) {
67+
if (w.size() != g.size()) throw std::invalid_argument("size mismatch");
68+
if (v_.size() != w.size()) throw std::invalid_argument("velocity size mismatch");
69+
70+
for (std::size_t i = 0; i < w.size(); ++i) {
71+
v_[i] = mu_ * v_[i] + lr_ * g[i];
72+
w[i] -= v_[i];
73+
}
74+
}
75+
```
76+
77+
## Using the optimizers
78+
79+
### Owning an optimizer (runtime polymorphism)
80+
81+
```cpp
82+
#include <memory>
83+
#include "cppx/opt/optimizers.hpp"
84+
85+
using namespace cppx::opt;
86+
87+
std::vector<double> w(d, 0.0), g(d, 0.0);
88+
89+
// Choose an algorithm at runtime:
90+
std::unique_ptr<Optimizer> opt =
91+
std::make_unique<Momentum>(/*lr=*/0.1, /*mu=*/0.9, /*dim=*/w.size());
92+
93+
for (int epoch = 0; epoch < 100; ++epoch) {
94+
// ... compute gradients into g ...
95+
opt->step(w, g); // updates w in place
96+
}
97+
```
98+
99+
### Borrowing an optimizer (no ownership transfer)
100+
101+
```cpp
102+
void train_one_epoch(Optimizer& opt,
103+
std::vector<double>& w,
104+
std::vector<double>& g) {
105+
// ... fill g ...
106+
opt.step(w, g);
107+
}
108+
```
109+
110+
### API variations (optional)
111+
112+
If C++20 is available, `std::span` can make the interface container‑agnostic:
113+
114+
```cpp
115+
// virtual void step(std::span<double> w, std::span<const double> g) = 0;
116+
```

0 commit comments

Comments
 (0)