Parallel Video Frame Processing using MPI + CUDA

This project demonstrates how to build a distributed GPU-accelerated image processing pipeline using MPI (Message Passing Interface) and CUDA (Compute Unified Device Architecture). It processes thousands of video frames in parallel by distributing tasks across processes (MPI) and accelerating computation per frame on the GPU (CUDA).

What This Project Does

Extracts frames from a video (done via a Python script using OpenCV).
Distributes frame-processing tasks across multiple MPI workers.
Each worker loads a frame, sends it to CUDA on GPU to invert its colors.
The processed frame is saved.
When all frames are processed, they can be reassembled into a new video using FFmpeg.

This setup is ideal for learning hybrid parallel programming that combines CPU task scheduling with GPU computation.

Tech Stack

C (MPI) for parallel task distribution
CUDA for GPU image inversion
stb_image / stb_image_write for image I/O
OpenCV (Python) for frame extraction (optional)
FFmpeg for video reconstruction (optional)

Project Structure

mpi-project/
├── src/
│ ├── main_serial.c # Serial version
│ ├── main_cuda.cu # CUDA-only
│ ├── main_mpi.c # MPI-only
│ ├── main_mpi_cuda.cu # MPI + CUDA
│ ├── master.c # MPI master logic
│ ├── task_queue.c # Simple task scheduler for MPI
│ ├── cuda_filter.cu # Filtering kernels
│ ├── frame_io.c/h # Image I/O
│ ├── utils.c/h # Utility helpers
├── include/

│ ├── cuda_filter.h
│ ├── frame_io.h
│ ├── task_queue.h
│ └── utils.h
├── frames/ # Input frames
├── output/ # Processed output
├── bash_scripts/ # Demo scripts
├── extract_frames.py # Split video into frames
└── Makefile

How It Works

Step 1: Master-Worker Model (MPI)

The master (rank 0) loads all available image filenames into a task queue.
Each worker (rank > 0) sends a task request to the master.
The master sends a frame path to the worker.
The worker processes the frame and returns a result log.

This continues until all frames are processed.

Step 2: Per-Frame GPU Processing (CUDA)

Each frame is passed to a CUDA kernel that performs color inversion: output_pixel = 255 - input_pixel

This is done in parallel for each pixel using GPU threads.

Step 3: Reconstruct Video

Once all frames are processed, FFmpeg can be used to stitch them into a video:

ffmpeg -framerate 30 -i output/frame_%04d.jpg -c:v libx264 output.mp4

Installation & Build

Please visit `SIMPLE_INSTRUCTION.md` for a more straightforward instrction

Build

To elaborate further, these are the commands in the bash scripts. Note that this only works after frames from the video were extracted and contained in the /frames folder.

Version 1: Serial (no MPI, no CUDA)

./exec_serial

Version 2: MPI-only

.mpirun -np 4 ./exec_mpi_only

Version 3: CUDA-only

./exec_cuda_only

Version 4: MPI + CUDA (multi-node or multi-GPU)

mpirun -np 8 ./exec_full

📦 Output

Each processed frame will be saved to output/frame_XXXX.jpg.

The output images will be the color-inverted versions of the input frames.

How MPI + CUDA Work Together

MPI Master: Distributes frame tasks to workers MPI Worker: Receives frame path, processes it CUDA Kernel: Inverts pixel values on GPU Task Queue: Dynamically assigns frames as they are available

Credits

stb_image
OpenCV
OpenMPI
NVIDIA CUDA Toolkit

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
bash_scripts		bash_scripts
build/obj		build/obj
include		include
logs		logs
src		src
.gitignore		.gitignore
Makefile		Makefile
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SIMPLE_INSTRUCTION.md		SIMPLE_INSTRUCTION.md
cappy.mp4		cappy.mp4
cappy.mp4:Zone.Identifier		cappy.mp4:Zone.Identifier
code.txt		code.txt
cuda_filter.o		cuda_filter.o
exec_cuda		exec_cuda
exec_mpi_only		exec_mpi_only
frame_io.o		frame_io.o
main_mpi_cuda.o		main_mpi_cuda.o
master.o		master.o
myhost.txt		myhost.txt
output_serial.mp4		output_serial.mp4
panda.mp4:Zone.Identifier		panda.mp4:Zone.Identifier
project_setup.sh		project_setup.sh
project_snapshot.txt		project_snapshot.txt
requirements.txt		requirements.txt
run_all.sh		run_all.sh
scrape_project.sh		scrape_project.sh
shell.nix		shell.nix
task_queue.o		task_queue.o
test_segment		test_segment
test_segment.o		test_segment.o
test_segment_label		test_segment_label
utils.o		utils.o
worker_cuda.o		worker_cuda.o

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parallel Video Frame Processing using MPI + CUDA

What This Project Does

Tech Stack

Project Structure

How It Works

Step 1: Master-Worker Model (MPI)

Step 2: Per-Frame GPU Processing (CUDA)

Step 3: Reconstruct Video

Installation & Build

Please visit `SIMPLE_INSTRUCTION.md` for a more straightforward instrction

Build

Version 1: Serial (no MPI, no CUDA)

Version 2: MPI-only

Version 3: CUDA-only

Version 4: MPI + CUDA (multi-node or multi-GPU)

📦 Output

How MPI + CUDA Work Together

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

pengwingokla/MPI-CUDA-Video-Parallel-Processor

Folders and files

Latest commit

History

Repository files navigation

Parallel Video Frame Processing using MPI + CUDA

What This Project Does

Tech Stack

Project Structure

How It Works

Step 1: Master-Worker Model (MPI)

Step 2: Per-Frame GPU Processing (CUDA)

Step 3: Reconstruct Video

Installation & Build

Please visit SIMPLE_INSTRUCTION.md for a more straightforward instrction

Build

Version 1: Serial (no MPI, no CUDA)

Version 2: MPI-only

Version 3: CUDA-only

Version 4: MPI + CUDA (multi-node or multi-GPU)

📦 Output

How MPI + CUDA Work Together

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Please visit `SIMPLE_INSTRUCTION.md` for a more straightforward instrction

Packages