Skip to content

Replication package containing code and experimental results for the paper "Abstractions for C++ Code Optimizations in Parallel High-performance Applications" submitted to the special issue of the ParCo2024 journal for the PMAM2024 workshop.

License

Notifications You must be signed in to change notification settings

jiriklepl/ParCo2024-artifact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Artifact for Abstractions for C++ Code Optimizations in Parallel High-performance Applications

This is the replication package containing code and experimental results for the paper "Abstractions for C++ Code Optimizations in Parallel High-performance Applications" submitted to the special issue of the Parallel Computing journal for The 15th International Workshop on Programming Models and Applications for Multicores and Manycores.

The paper presents a continuation of the work on the Noarr library, which can be found at https://github.com/ParaCoToUl/noarr-structures. The earlier work on this extension was presented at the PMAM 2024 workshop paper Pure C++ Approach to Optimized Parallel Traversal of Regular Data Structures (doi: 10.1145/3649169.3649247). The work on the Noarr library was presented at the ICA3PP 2022 conference paper Astute Approach to Handling Memory Layouts of Regular Data Structures (doi: 10.1007/978-3-031-22677-9_27).

The abstractions extending the Noarr library are implemented in the following two repositories:

Table of contents

  • Overview - overview of the contents of the artifact
  • Requirements - software requirements for the experiments
  • Experiment reproduction - steps for reproducing the experiments
  • Validation - steps for validating the implementations
  • Code comparison - steps for comparing the code of the original Polybench/C benchmark and the Noarr implementation (summarization presented in the paper) and comparing the code changes required to perform the tuning transformations implemented in PolybenchC-tuned on algorithms in PolybenchC-pretune that are adjusted for the transformations

Overview

The artifact contains experimental results on various modifications of the following benchmark suits:

The artifact uses the implementation of Noarr from https://github.com/jiriklepl/noarr-structures, which is continuously tested on various platforms using GitHub Actions. The tests cover the following compilers: GCC (10, 11, 12, 13), Clang (14, 15), NVCC (12), and MSVC (19) on Linux, macOS, and Windows.

The artifact structure:

  • running-examples: Contains the examples of Noarr code presented in the paper.
  • plots: Generated plots used in paper figures.
  • results: CSV files with the measured wall-clock times and the code comparison and archives with the measured wall-clock times; also contains the log file with the summarized statistics of the code comparison and the autotuning results.
  • relating to the Polybench/C-4.2.1 benchmark suite:
  • relating to the PolyBench/GPU-1.0 benchmark suite:
    • PolyBenchGPU: The PolyBench/GPU with minor bug fixes (mostly relating to wrong arguments) and modified dataset sizes for convenience of measurement.
    • PolyBenchGPU-Noarr: Implementation of the PolyBench/GPU benchmark using Noarr structures and Noarr Traversers.
  • transformations.md: Presents a detailed list of transformations provided by Noarr Traversers.
  • scripts: Contains the scripts used for running the experiments, validating the implementations, generating the plots presented in the paper, and producing further data from the collected measurements.

Requirements

The artifact considers the following software requirements for the experiments:

  • C++ compiler with support for C++20 - preferably GCC 12, or newer
  • Intel TBB 2021.3 or newer
  • NVCC 12 or newer
  • CMake 3.10 or newer
  • GNU time (only for autotuning)
  • Python 3.6 or newer (only for autotuning)

The following software is required for the analysis of the results:

  • awk
  • R with ggplot2 and dplyr packages
  • gcc
  • clang, clang-format
  • standard tools: namely wc, gzip, tar, GNU time

The following software is required for the running examples presented in the paper:

  • C++ compiler with support for C++20 - preferably GCC 12 or newer
  • Intel TBB 2021.3 or newer
  • NVCC 12 or newer
  • CMake 3.10 or newer
  • Python 3.6 or newer

Experiment reproduction

The experiments can be reproduced using the following steps:

# clone the repository and enter it
git clone "https://github.com/jiriklepl/ParCo2024-artifact.git"
cd ParCo2024-artifact

# for the CPU experiments:
scripts/run-measurements-CPU.sh

# for the GPU experiments:
scripts/run-measurements-GPU.sh

# generate the plots presented in the paper:
scripts/generate_plots.sh

# measure and visualize the compile time comparison and autotuning results:
scripts/autotuning-test.sh
scripts/visualize-autotuning.sh

# -- optionally --

# generate additional visualizations of the measured wall-clock times and the code comparison:
scripts/more_visualizations.sh

In our laboratory cluster, we use the Slurm workload manager. Setting the USE_SLURM environment variable to 1 configures the scripts to use Slurm for running the experiments in the configuration that we used for the paper.

This configuration can be modified in the scripts run by the scripts/run-measurements-CPU.sh, scripts/run-measurements-GPU.sh, and scripts/autotuning-test.sh scripts; and similarly, in the scripts/autotuning-test.sh script.

After running the experiments, the plots presented in the paper can be generated using scripts/generate_plots.sh and scripts/visualize-autotuning.sh. The resulting plots are stored in the plots directory. The script scripts/vizualize-autotuning.sh also generates the file results/autotuning-report.log with the summarized statistics of the autotuning results presented in the paper.

This process also generates the corresponding CSV files with the measured wall-clock times in subdirectories of the results directory.

More visualizations

Additional visualizations of the measured wall-clock times and the code comparison can be generated by running the scripts/more_visualizations.sh script. The resulting plots are stored in the plots directory.

Validation

The validation of the implementations can be performed using the following steps:

# clone the repository and enter it
git clone "https://github.com/jiriklepl/ParCo2024-artifact.git"
cd ParCo2024-artifact

# for the CPU experiments:
scripts/validate-CPU.sh

# for more strict validation of the CPU experiments (runs ASan, UBSan, and leak sanitizer):
scripts/sanitize-CPU.sh

# for the GPU experiments:
scripts/validate-GPU.sh

This script runs the implementations of the Polybench/C and the tuned/TBB-parallelized/OpenMP-parallelized versions and the PolyBench/GPU baseline algorithms and compares their respective outputs with their Noarr counterparts. It reports any mismatches found in the outputs.

The validation scripts check whether the outputs of the implementations are the same (there is a zero threshold for the difference) on multiple datasets: SMALL, MEDIUM, LARGE, EXTRALARGE. The sanity check script runs the implementations on the SMALL dataset (and also compares the outputs).

Code comparison

# clone the repository and enter it
git clone "https://github.com/jiriklepl/ParCo2024-artifact.git"
cd ParCo2024-artifact

mkdir -p results

# for the Polybench/C benchmark and the Noarr implementation:
scripts/code_compare.sh > "results/code_overall.log"

# for the tuning transformations related to PolybenchC-Noarr-tuned and PolybenchC-tuned:
scripts/compare_transformations.sh > "results/compare_transformations.log"

These scripts are designed to provide more insights into where the Noarr approach is more complex and where it saves some coding effort.

The code comparison can be performed by running scripts/code_compare.sh. It compares the code of the original Polybench/C benchmark and the Noarr implementation and outputs the differences into the file results/statistics.csv and the summarized statistics to the standard output (as shown in results/code_overall.log).

The code for hand-tuned transformations can be compared by running scripts/compare_transformations.sh. The results of this are shown in results/compare_transformations.log. The log file dumps the code changes to require the transformations for each algorithm and suite implementation and prints the summarized statistics for each implementation.

About

Replication package containing code and experimental results for the paper "Abstractions for C++ Code Optimizations in Parallel High-performance Applications" submitted to the special issue of the ParCo2024 journal for the PMAM2024 workshop.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •