Artifact for Abstractions for C++ Code Optimizations in Parallel High-performance Applications

This is the replication package containing code and experimental results for the paper "Abstractions for C++ Code Optimizations in Parallel High-performance Applications" submitted to the special issue of the Parallel Computing journal for The 15th International Workshop on Programming Models and Applications for Multicores and Manycores.

The paper presents a continuation of the work on the Noarr library, which can be found at https://github.com/ParaCoToUl/noarr-structures. The earlier work on this extension was presented at the PMAM 2024 workshop paper Pure C++ Approach to Optimized Parallel Traversal of Regular Data Structures (doi: 10.1145/3649169.3649247). The work on the Noarr library was presented at the ICA3PP 2022 conference paper Astute Approach to Handling Memory Layouts of Regular Data Structures (doi: 10.1007/978-3-031-22677-9_27).

The abstractions extending the Noarr library are implemented in the following two repositories:

Noarr Traversers: https://github.com/jiriklepl/noarr-structures
Noarr Tuning: https://github.com/jiriklepl/noarr-tuning

Overview - overview of the contents of the artifact
Requirements - software requirements for the experiments
Experiment reproduction - steps for reproducing the experiments
Validation - steps for validating the implementations
Code comparison - steps for comparing the code of the original Polybench/C benchmark and the Noarr implementation (summarization presented in the paper) and comparing the code changes required to perform the tuning transformations implemented in PolybenchC-tuned on algorithms in PolybenchC-pretune that are adjusted for the transformations

Overview

The artifact contains experimental results on various modifications of the following benchmark suits:

The artifact uses the implementation of Noarr from https://github.com/jiriklepl/noarr-structures, which is continuously tested on various platforms using GitHub Actions. The tests cover the following compilers: GCC (10, 11, 12, 13), Clang (14, 15), NVCC (12), and MSVC (19) on Linux, macOS, and Windows.

The artifact structure:

running-examples: Contains the examples of Noarr code presented in the paper.
plots: Generated plots used in paper figures.
results: CSV files with the measured wall-clock times and the code comparison and archives with the measured wall-clock times; also contains the log file with the summarized statistics of the code comparison and the autotuning results.
relating to the Polybench/C-4.2.1 benchmark suite:
- PolybenchC-4.2.1: The Polybench/C benchmark with custom build script for convenience and flatten pragmas for consistency with Noarr.
- PolybenchC-Noarr: Implementation of the Polybench/C benchmark using Noarr structures and Noarr Traversers.
- PolybenchC-pretune: Modification of the algorithms from Polybench/C benchmark that are subjected to tuning in PolybenchC-Noarr-tuned. Some of the loops are broken down into multiple loops to allow better traversal options.
- PolybenchC-tuned: Tuned versions of four algorithms from PolybenchC-Noarr (PolybenchC-pretune). Each algorithm is taken from a different category of algorithms from the Polybench/C benchmark.
- PolybenchC-Noarr-tuned: Implementation of the tuned algorithms from PolybenchC-tuned using Noarr Traversers.
- PolybenchC-Noarr-tuning: Implementation of the autotuning of the Polybench/C benchmarks reimplemented using Noarr Structures and Noarr Traversers using Noarr Tuning.
- PolybenchC-tbb: Parallelization of four algorithms from PolybenchC-4.2.1 using Intel TBB.
- PolybenchC-Noarr-tbb: Implementation of the parallelized algorithms from PolybenchC-tbb using Noarr Traversers.
- PolybenchC-omp: Parallelization of four algorithms from PolybenchC-4.2.1 using OpenMP.
- PolybenchC-Noarr-omp: Implementation of the parallelized algorithms from PolybenchC-omp using Noarr Traversers.
relating to the PolyBench/GPU-1.0 benchmark suite:
- PolyBenchGPU: The PolyBench/GPU with minor bug fixes (mostly relating to wrong arguments) and modified dataset sizes for convenience of measurement.
- PolyBenchGPU-Noarr: Implementation of the PolyBench/GPU benchmark using Noarr structures and Noarr Traversers.
transformations.md: Presents a detailed list of transformations provided by Noarr Traversers.
scripts: Contains the scripts used for running the experiments, validating the implementations, generating the plots presented in the paper, and producing further data from the collected measurements.

Requirements

The artifact considers the following software requirements for the experiments:

C++ compiler with support for C++20 - preferably GCC 12, or newer
Intel TBB 2021.3 or newer
NVCC 12 or newer
CMake 3.10 or newer
GNU time (only for autotuning)
Python 3.6 or newer (only for autotuning)

The following software is required for the analysis of the results:

awk
R with ggplot2 and dplyr packages
gcc
clang, clang-format
standard tools: namely wc, gzip, tar, GNU time

The following software is required for the running examples presented in the paper:

C++ compiler with support for C++20 - preferably GCC 12 or newer
Intel TBB 2021.3 or newer
NVCC 12 or newer
CMake 3.10 or newer
Python 3.6 or newer

Experiment reproduction

The experiments can be reproduced using the following steps:

# clone the repository and enter it
git clone "https://github.com/jiriklepl/ParCo2024-artifact.git"
cd ParCo2024-artifact

# for the CPU experiments:
scripts/run-measurements-CPU.sh

# for the GPU experiments:
scripts/run-measurements-GPU.sh

# generate the plots presented in the paper:
scripts/generate_plots.sh

# measure and visualize the compile time comparison and autotuning results:
scripts/autotuning-test.sh
scripts/visualize-autotuning.sh

# -- optionally --

# generate additional visualizations of the measured wall-clock times and the code comparison:
scripts/more_visualizations.sh

In our laboratory cluster, we use the Slurm workload manager. Setting the USE_SLURM environment variable to 1 configures the scripts to use Slurm for running the experiments in the configuration that we used for the paper.

This configuration can be modified in the scripts run by the scripts/run-measurements-CPU.sh, scripts/run-measurements-GPU.sh, and scripts/autotuning-test.sh scripts; and similarly, in the scripts/autotuning-test.sh script.

After running the experiments, the plots presented in the paper can be generated using scripts/generate_plots.sh and scripts/visualize-autotuning.sh. The resulting plots are stored in the plots directory. The script scripts/vizualize-autotuning.sh also generates the file results/autotuning-report.log with the summarized statistics of the autotuning results presented in the paper.

This process also generates the corresponding CSV files with the measured wall-clock times in subdirectories of the results directory.

More visualizations

Additional visualizations of the measured wall-clock times and the code comparison can be generated by running the scripts/more_visualizations.sh script. The resulting plots are stored in the plots directory.

Validation

The validation of the implementations can be performed using the following steps:

# clone the repository and enter it
git clone "https://github.com/jiriklepl/ParCo2024-artifact.git"
cd ParCo2024-artifact

# for the CPU experiments:
scripts/validate-CPU.sh

# for more strict validation of the CPU experiments (runs ASan, UBSan, and leak sanitizer):
scripts/sanitize-CPU.sh

# for the GPU experiments:
scripts/validate-GPU.sh

This script runs the implementations of the Polybench/C and the tuned/TBB-parallelized/OpenMP-parallelized versions and the PolyBench/GPU baseline algorithms and compares their respective outputs with their Noarr counterparts. It reports any mismatches found in the outputs.

The validation scripts check whether the outputs of the implementations are the same (there is a zero threshold for the difference) on multiple datasets: SMALL, MEDIUM, LARGE, EXTRALARGE. The sanity check script runs the implementations on the SMALL dataset (and also compares the outputs).

Code comparison

# clone the repository and enter it
git clone "https://github.com/jiriklepl/ParCo2024-artifact.git"
cd ParCo2024-artifact

mkdir -p results

# for the Polybench/C benchmark and the Noarr implementation:
scripts/code_compare.sh > "results/code_overall.log"

# for the tuning transformations related to PolybenchC-Noarr-tuned and PolybenchC-tuned:
scripts/compare_transformations.sh > "results/compare_transformations.log"

These scripts are designed to provide more insights into where the Noarr approach is more complex and where it saves some coding effort.

The code comparison can be performed by running scripts/code_compare.sh. It compares the code of the original Polybench/C benchmark and the Noarr implementation and outputs the differences into the file results/statistics.csv and the summarized statistics to the standard output (as shown in results/code_overall.log).

The code for hand-tuned transformations can be compared by running scripts/compare_transformations.sh. The results of this are shown in results/compare_transformations.log. The log file dumps the code changes to require the transformations for each algorithm and suite implementation and prints the summarized statistics for each implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Artifact for Abstractions for C++ Code Optimizations in Parallel High-performance Applications

Table of contents

Overview

Requirements

Experiment reproduction

More visualizations

Validation

Code comparison

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
PolyBenchGPU-Noarr		PolyBenchGPU-Noarr
PolyBenchGPU		PolyBenchGPU
PolybenchC-4.2.1		PolybenchC-4.2.1
PolybenchC-Noarr-omp		PolybenchC-Noarr-omp
PolybenchC-Noarr-tbb		PolybenchC-Noarr-tbb
PolybenchC-Noarr-tuned		PolybenchC-Noarr-tuned
PolybenchC-Noarr-tuning		PolybenchC-Noarr-tuning
PolybenchC-Noarr		PolybenchC-Noarr
PolybenchC-omp		PolybenchC-omp
PolybenchC-pretune		PolybenchC-pretune
PolybenchC-tbb		PolybenchC-tbb
PolybenchC-tuned		PolybenchC-tuned
plots		plots
results		results
running-examples		running-examples
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
output-CPU.log		output-CPU.log
output-GPU.log		output-GPU.log
transformations.md		transformations.md

License

jiriklepl/ParCo2024-artifact

Folders and files

Latest commit

History

Repository files navigation

Artifact for Abstractions for C++ Code Optimizations in Parallel High-performance Applications

Table of contents

Overview

Requirements

Experiment reproduction

More visualizations

Validation

Code comparison

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages