cuSpike is a spiking neural network simulator targeting recent CUDA-capable GPUs. The simulator takes neuron models in its own domain specific language called CSM (cuSpike Model), and produces CUDA code, which is then combined with the existing infrastructure code to produce a standalone simulator executable. Arbitrary ODE-based models (including but not limited to LIF and Hodgkin Huxley neurons) are supported, but there is currently no support for STDP or related learning mechanisms.
For small networks, the simulator utilizes cooperative groups and stores neuron state variables in registers to avoid kernel launch overhead and long memory access latency of the global memory. When the networks get large, in which case the neuron states cannot be fit into the registers, the simulator tries to exhaust shared memory entirely before starting to consume global memory for spike propagation.
# Clone the repository
git clone https://github.com/kocatepedogu/cuspike.git
cd cuspike
# Build the container.
cd docker
./build
# Open a shell inside the container. Stays in the current
# directory which is automatically mounted.
./enter
# Compile the CSM compiler
cd compiler
make -j$(nproc)
The 'build' script will create an AlmaLinux 9.4 container with cuSpike, Brian2Cuda (1.0a6), NEST GPU and GeNN (5.1.0) simulators, together with the CUDA Toolkit (12.6). Building the container may take up to an hour, mostly due to the long download time for the CUDA Toolkit. Both Podman and Docker should work properly independent of the host distribution. Please send a bug report if you encounter an error in this step. CUDA Container Toolkit is not required.
Create a new file named model.csm and define your neuronal network in it. An example can be found in benchmark/cuba/cuba.csm. After writing the model, create the following Python script in the same directory:
from pycuspike import Model
model = Model('model.csm')
model.build()
model.simulate()
t_array, s_array = model.spikes()
Execute the script from within the container.
python test.py
The recorded spikes are returned in t_array and s_array, which are both numpy arrays. The first array stores the time of each spike whereas the second array stores the index of the neurons which produced those spikes. A raster plot can be generated by adding the following code to the script.
import matplotlib.pyplot as plt
plt.figure(figsize=(75, 75))
plt.plot(t_array, s_array, ',k')
plt.savefig('cuspike.png')
The CUBA benchmark model (for which the original reference implementation can be found in Brian2 documentation) is provided as an example under benchmark/cuba
directory. The example has 20,000 neurons and simulates 100 seconds of biological time, which are higher than the original model (4000N/1s).
The COBAHH benchmark model from Brian2 documentation is also provided as an example under benchmark/cobahh
directory. The example has 100,000 neurons and simulates 2 seconds of biological time, which are higher than the original model (4000N/1s).
To execute the benchmarks, enter into the benchmark directory from within the container and execute the benchmark script. The script runs cuSpike, GeNN, Brian2Cuda and NEST GPU with the CUBA model. 20,000 neurons are simulated for a biological time of 100 seconds. The figures below are generated at the end of benchmarks under benchmark/cuba
directory. All figures were obtained on RTX 4080.
cd benchmark
./benchmark
The simulators are first executed without previously generated/cached files. This run provides the amount of time needed for both code generation/compilation and simulation. The results are presented below.
In the second benchmark, the simulators are run with existing files from the previous execution. cuSpike, Brian2Cuda and GeNN use code generation and all are able to detect whether any changes has been made to the model. In case no changes are detected, they use the previously created binaries.
NEST GPU does not use code generation, so the results do not differ significantly.
In addition to the CUBA model, the COBAHH model is also executed by the benchmark script, but its performance is only compared against Brian2Cuda. The results excluding the compilation time is given in the following plot. The other simulators may be added in the future for comparison.
The following is the table of total energy consumption values measured by nvidia-smi on RTX 4050 during the CUBA benchmark (compilation included).
Simulator | Energy Consumption (J) |
---|---|
cuSpike | 325.8691000000002 |
GeNN | 699.5070000000002 |
NEST GPU | 1461.7808000000293 |
Brian2Cuda | 2118.830100000042 |
The same measurements on RTX 4080 are as follows
Simulator | Energy Consumption (J) |
---|---|
cuSpike | 366.28690000000313 |
GeNN | 738.0435000000045 |
NEST GPU | 2505.8223999999936 |
Brian2Cuda | 5095.273399999765 |
In the CUBA benchmark, average firing rate distribution and CV ISI distribution are very close to Brian2Cuda and NEST GPU. When Brian2Cuda is considered the ground truth, cuSpike provides comparable distributions (similar KL-divergences) relative to GeNN and NEST GPU.
İzmir Institute of Technology
Department of Computer Engineering
Bachelor of Science Thesis Project
Advisor: Assoc. Prof. Dr. Işıl Öz
cuSpike is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
cuSpike is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with cuSpike. If not, see https://www.gnu.org/licenses/.