This project implements a sparsely connected Feed-Forward Neural Network (FFNN) using two parallel computing approaches:
- OpenMP for multi-core CPU parallelization
- CUDA for GPU acceleration
This project is part of the Architecture and Platform for Artificial Intelligence exam.
The neural network has the following key characteristics:
- Input layer size: N neurons
- Total layers: K
- Connectivity reach: R (each node connects to R previous nodes)
- Uses sigmoid activation function
- Includes a bias term
- Parallelizes computation across nodes within each layer
- Uses
#pragma omp parallel for
to distribute workload - Supports multi-threading on shared-memory CPU architectures
- Implements two kernel versions:
- Shared memory kernel
- Global memory kernel
- Uses double-buffering strategy for layer computations
- Optimizes memory access patterns for GPU computation
cd openMP
gcc -std=c99 -Wall -Wpedantic -fopenmp -Iinclude src/main.c src/network.c src/utilities.c -o openMP -lm
or
cd openMP
mkdir build
cd build
cmake ..
make
./openMP [N] [K] [machine_output]
N
: Number of neurons in the first layerK
: Total number of layersmachine_output
: 1 for machine-readable output, 0 for human-readable (default: 1)
cd CUDA
mkdir build
nvcc -Iinclude src/main.cu -o build/main
cd build
./main [N] [K] [machine_output]
- Same parameters as OpenMP version
Refer to Report.pdf
for a more accurate analysis of the performances of both implementations
- GCC with OpenMP support
- NVIDIA CUDA Toolkit
- CMake (optional, for alternative build method)
Luca Tedeschini - University of Bologna