![]() |
![]() |
![]() |
---|
Simple C++ implementations of a Hopfield Network with Hebbian learning rules and Generalized Boltzmann Machine, and Restricted Boltzmann Machine with Contrastive Divergence trained on the MNIST dataset.
This project uses CMake. After cloning this repository, run the following to compile.
mkdir -p build
cd build
cmake ..
make
Then, you can run ./main [OPTIONS]
with the following options:
--help Show this help message and exit
--model, -m <type> Choose model type: 'hopfield', 'boltzmann', or 'rbm' (default: rbm)
--train, -t <type> 'true' for training, 'false' for inference (default: true)
--data, -d <string> Path to .npy file containing training data
--name <string> Output/input filename without extension (e.g. 'mnist_rbm')
--epochs, -e <int> Number of training epochs (e.g., 20)
--batch_size, -b <int> Batch size for training (e.g., 256)
--hidden, -n <int> Number of hidden neurons (e.g., 100)
--cd_k, -k <int> Number of Contrastive Divergence steps (e.g., 10)
--lr, -l <float> Learning rate (e.g., 0.05)
--momentum, -p <float> Momentum for gradient update (e.g., 0.5)
--w_std, -w <float> Standard deviation for initializing weights (e.g., 0.1)
--x_mean, -x <float> Mean for initializing visible bias (e.g., -0.2)
--h_mean, -h <float> Mean for initializing hidden bias (e.g., -0.5)
For example, to train a Restricted Boltzmann Machine on the default dataset:
./main --model rbm --epochs 20 --batch_size 64 --cd_k 2 --lr 0.05
And then to run inference on the trained model:
./main --model rbm --train false
We structure
We can do this by initializing the network weights according to the Hebbian learning rule,
and setting the weights along the diagonal to 0.
To run inference, we initialize a state vector
where
A Boltzmann Machine (BM) introduces significantly more stochasticity into the learning and inference processes compared to Hopfield networks and can be used to learn more complex data.
![]() |
![]() |
![]() |
![]() |
---|---|---|---|
Boltzmann Machine | Hopfield Network | Boltzmann Machine | Hopfield Network |
The Boltzmann machine's state is defined by visible neurons
- visible-to-visible neurons
$A\inℝ^{D\times D}$ - hidden-to-hidden neurons
$B\inℝ^{K\times K}$ - visible-to-hidden neurons
$W\inℝ^{D\times K}$
and biases,
- visible neuron biases
$a\inℝ^D$ - hidden neurons biases
$b\inℝ^K$
We wish to minimize the following energy function:
The weight matrices are initialized with a Gaussian distribution,
and have their diagonals set to zero. Unlike the deterministic training of the Hopfield network, the Boltzmann machine follows a gradient-descent-like process.
Inference is very similar to the Hopfield network, but with a more stochastic update rule,
where
![]() |
![]() ![]() |
---|---|
MNIST Model Filters | Randomly Generated MNIST Outputs |
Boltzmann machines do not perform very well unless constrained. A Restricted Boltzmann Machine (RBM) is created by eliminating connections between the same class (i.e. visible/hidden) neurons. This is equivalent to letting
The new energy function is to minimize is
By removing inter-class connections, we can also greatly speed up the training process by updating all the hidden/visible neurons at once.
Inference is almost identical to the Boltzmann Network with the new update rules
Note that since the probabilities of updating the visible/hidden neurons only depend on the hidden/visible neurons, we can parallize the updates and make the learning process faster.