Skip to content

aiaskarioris/OpenMP1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenMP Project

This is a project for the Parallel Systems course provided by the Department of Informatics and Computer Engineering, in University of West Attica. It is written in C and uses the OpenMP library.

Abstract

This exercise works on Diagonally Dominant Matrices (DDM).

The exercise is split into four parts where in each part the goal is achieved using parallel computations provided by the OpenMP library. The four parts are:

  • Part A: Determine if the input matrix is diagonally dominant
  • Part B: Find the largest diagonal element using reduction
  • Part C: Generate a new square matrix (see Part C: New Matrix)
  • Part D: Find the smallest element of the matrix generated above.

Part D is implemented in three different algorithms. See below for more details.

Development Environment

This project was developed and tested in Ubuntu 22.04 with the use of a virtual machine. Both the host and the virtual machine were of x86-64 architecture. The virtual machine utilized 4 CPU threads and 6GB RAM. As such, all tests of the parallel portions of the program were conducted with 2, 3 & 4 threads.


Results

The average time needed for the various program versions to complete is shown on the graphs below.

16M Numbers

16M Numbers

236M Numbers

236M

Execution time for input matrix size

Exec. Time to Input Size All tests executed with 4 threads.


Details

In the following paragraphs the way the program works is explained in detail, as well as the ideas behind the code. A PDF version of the following is also provided, albeit written in Greek.

Source Code

There are three different source files that can be found in the source directory. Each one implements one of the three different algorithms used in Part D but are otherwise identical. More details about these files can be found there. Also, in the tools-source directory, the source code of the tool that generates input files can be found.

Input Files

When running any version of the program the input matrix is provided by a binary file. These files are called 'Test Files' in the documentation and are generated by the gen tool (see tools-source). Test files must be placed in a directory called "test". The test file format is fairly simple, consisting of just a 16 byte header and then the raw data of the matrix.

Test file format

The header consists of the number N, the flag byte and a few padding zeros.

The program's outline

In this section of this page the individual parts are explained. During execution, some memory "book-keeping" procedures take place so that memory allocated no longer used can be freed.

Selecting an input file

The program is initiated via a command line interface with the test file number provided as an argument. The program will search for the specified input file in the test directory.

Part A: DDM Check

In order to check efficiently if a matrix is Diagonally Dominant all rows must be summed and compared to their diagonal element. Since each row check is completely independent from the other rows, each thread can check a number of rows.

If a thread comes across a row that doesn't fit a DDM it atomically writes to the result shared variable, which is initialized to 0. If, by the end of the first part, result is no longer 0 the program halts since the input matrix isn't Diagonally Dominant. By writing to result only on the bad-case scenario no time is lost by threads waiting to execute the atomic operation, thus speeding up the good-case scenario.

Part B: Max. Element in Diagonal

In order to find the greatest element of the diagonal the for reduction directive is used on the new array D. This array is first filled with the diagonal elements of the input matrix and then reduce is executed on it.

Part C: New Matrix

In this part a new matrix is generated using the elements of the input matrix and the largest number found in Part B, stored in the variable m.

  • Each non-diagonal element D[i,j] of the new matrix is the algebraic distance of the element at i,j in the input matrix from the maximum m.
  • All diagonal elements of the new matrix are equal to m.

Part D: Minimum in New Matrix

In this part, the minimum element of the new matrix is found. As mentioned above, there are 3 different versions of the program, each implementing the final part with a different algorithm. Below these three algorithms are described.

Reduction Clause

Using for reduce on the new matrix to find the minimum is very simple and essentially only a for-loop is required. In this case, all elements of the new matrix are checked, including the elements of the diagonal that all have the same value.

Critical Section

In this version a critical section is used, meaning that while one tread enters this part of the program no other thread can execute it until the first thread is done. Using this mechanism, however, will cause threads to idle until they can enter the critical section, thus causing execution to be slower. To find the minimum, each thread finds the minimum of a few row and writes this minimum in the S array. Once all lines have been processed, each thread enters the critical section and writes to the last position of the S array if the minimum it found is smaller than what is already in that location from the previous thread.

OpenMP allows parallel code to have critical sections with atomic operations, locks or critical directives. Since the critical section described above contains more than one operation, an atomic operation cannot be used. Locks would work but using them requires some additional memory-related operations so the critical directive has been used instead.

Binary Tree

This version uses a binary tree algorithm to find the minimum. This is done by having threads work in pairs during each step of the algorithm and gradually decrease the number of threads taking part in the computation.

The lines of the matrix are distributed to the threads and each thread computes the "local minimum" of the data it received. If the number of threads is a power of 2 then the next part of the algorithm can start. If the number of threads isn't appropriate (such as if there are 3 threads) the excess threads will compare local minimums with the others threads in this step so that they won't take part in the binary tree. The binary-tree portion of the algorithm consists of a few steps where all threads must take part so that they can execute the barrier directive. However, as step decreases with each loop, less and less threads execute comparisons.

See the source code for implementation details.

About

A simple OpenMP project

Topics

Resources

Stars

Watchers

Forks