This project demonstrates how to build a distributed GPU-accelerated image processing pipeline using MPI (Message Passing Interface) and CUDA (Compute Unified Device Architecture). It processes thousands of video frames in parallel by distributing tasks across processes (MPI) and accelerating computation per frame on the GPU (CUDA).
- Extracts frames from a video (done via a Python script using OpenCV).
- Distributes frame-processing tasks across multiple MPI workers.
- Each worker loads a frame, sends it to CUDA on GPU to invert its colors.
- The processed frame is saved.
- When all frames are processed, they can be reassembled into a new video using FFmpeg.
This setup is ideal for learning hybrid parallel programming that combines CPU task scheduling with GPU computation.
- C (MPI) for parallel task distribution
- CUDA for GPU image inversion
- stb_image / stb_image_write for image I/O
- OpenCV (Python) for frame extraction (optional)
- FFmpeg for video reconstruction (optional)
mpi-project/
├── src/
│ ├── main_serial.c # Serial version
│ ├── main_cuda.cu # CUDA-only
│ ├── main_mpi.c # MPI-only
│ ├── main_mpi_cuda.cu # MPI + CUDA
│ ├── master.c # MPI master logic
│ ├── task_queue.c # Simple task scheduler for MPI
│ ├── cuda_filter.cu # Filtering kernels
│ ├── frame_io.c/h # Image I/O
│ ├── utils.c/h # Utility helpers
├── include/
│ ├── cuda_filter.h
│ ├── frame_io.h
│ ├── task_queue.h
│ └── utils.h
├── frames/ # Input frames
├── output/ # Processed output
├── bash_scripts/ # Demo scripts
├── extract_frames.py # Split video into frames
└── Makefile
- The master (rank 0) loads all available image filenames into a task queue.
- Each worker (rank > 0) sends a task request to the master.
- The master sends a frame path to the worker.
- The worker processes the frame and returns a result log.
This continues until all frames are processed.
Each frame is passed to a CUDA kernel that performs color inversion: output_pixel = 255 - input_pixel
This is done in parallel for each pixel using GPU threads.
Once all frames are processed, FFmpeg can be used to stitch them into a video:
ffmpeg -framerate 30 -i output/frame_%04d.jpg -c:v libx264 output.mp4To elaborate further, these are the commands in the bash scripts. Note that this only works after frames from the video were extracted and contained in the /frames folder.
./exec_serial
.mpirun -np 4 ./exec_mpi_only
./exec_cuda_only
mpirun -np 8 ./exec_full
Each processed frame will be saved to output/frame_XXXX.jpg.
The output images will be the color-inverted versions of the input frames.
MPI Master: Distributes frame tasks to workers MPI Worker: Receives frame path, processes it CUDA Kernel: Inverts pixel values on GPU Task Queue: Dynamically assigns frames as they are available