Skip to content

A parallel computing project that demonstrates distributed GPU-accelerated video processing. It combines MPI for task distribution across multiple processes with CUDA for GPU acceleration to process video frames in parallel. The system extracts frames, distributes processing tasks to multiple workers, applies GPU-accelerated image filters,...

Notifications You must be signed in to change notification settings

pengwingokla/MPI-CUDA-Video-Parallel-Processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Video Frame Processing using MPI + CUDA

This project demonstrates how to build a distributed GPU-accelerated image processing pipeline using MPI (Message Passing Interface) and CUDA (Compute Unified Device Architecture). It processes thousands of video frames in parallel by distributing tasks across processes (MPI) and accelerating computation per frame on the GPU (CUDA).


What This Project Does

  • Extracts frames from a video (done via a Python script using OpenCV).
  • Distributes frame-processing tasks across multiple MPI workers.
  • Each worker loads a frame, sends it to CUDA on GPU to invert its colors.
  • The processed frame is saved.
  • When all frames are processed, they can be reassembled into a new video using FFmpeg.

This setup is ideal for learning hybrid parallel programming that combines CPU task scheduling with GPU computation.


Tech Stack

  • C (MPI) for parallel task distribution
  • CUDA for GPU image inversion
  • stb_image / stb_image_write for image I/O
  • OpenCV (Python) for frame extraction (optional)
  • FFmpeg for video reconstruction (optional)

Project Structure

mpi-project/
├── src/
│ ├── main_serial.c # Serial version
│ ├── main_cuda.cu # CUDA-only
│ ├── main_mpi.c # MPI-only
│ ├── main_mpi_cuda.cu # MPI + CUDA
│ ├── master.c # MPI master logic
│ ├── task_queue.c # Simple task scheduler for MPI
│ ├── cuda_filter.cu # Filtering kernels
│ ├── frame_io.c/h # Image I/O
│ ├── utils.c/h # Utility helpers
├── include/

│ ├── cuda_filter.h
│ ├── frame_io.h
│ ├── task_queue.h
│ └── utils.h
├── frames/ # Input frames
├── output/ # Processed output
├── bash_scripts/ # Demo scripts
├── extract_frames.py # Split video into frames
└── Makefile


How It Works

Step 1: Master-Worker Model (MPI)

  • The master (rank 0) loads all available image filenames into a task queue.
  • Each worker (rank > 0) sends a task request to the master.
  • The master sends a frame path to the worker.
  • The worker processes the frame and returns a result log.

This continues until all frames are processed.

Step 2: Per-Frame GPU Processing (CUDA)

Each frame is passed to a CUDA kernel that performs color inversion: output_pixel = 255 - input_pixel

This is done in parallel for each pixel using GPU threads.

Step 3: Reconstruct Video

Once all frames are processed, FFmpeg can be used to stitch them into a video:

ffmpeg -framerate 30 -i output/frame_%04d.jpg -c:v libx264 output.mp4

Installation & Build

Please visit SIMPLE_INSTRUCTION.md for a more straightforward instrction

Build

To elaborate further, these are the commands in the bash scripts. Note that this only works after frames from the video were extracted and contained in the /frames folder.

Version 1: Serial (no MPI, no CUDA)

./exec_serial

Version 2: MPI-only

.mpirun -np 4 ./exec_mpi_only

Version 3: CUDA-only

./exec_cuda_only

Version 4: MPI + CUDA (multi-node or multi-GPU)

mpirun -np 8 ./exec_full

📦 Output

Each processed frame will be saved to output/frame_XXXX.jpg.

The output images will be the color-inverted versions of the input frames.

How MPI + CUDA Work Together

MPI Master: Distributes frame tasks to workers MPI Worker: Receives frame path, processes it CUDA Kernel: Inverts pixel values on GPU Task Queue: Dynamically assigns frames as they are available

Credits


stb_image
OpenCV
OpenMPI
NVIDIA CUDA Toolkit

About

A parallel computing project that demonstrates distributed GPU-accelerated video processing. It combines MPI for task distribution across multiple processes with CUDA for GPU acceleration to process video frames in parallel. The system extracts frames, distributes processing tasks to multiple workers, applies GPU-accelerated image filters,...

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •