Skip to content

DDT shows message queues with pml ob1 but not with ucx and yalla. #6464

Open
@bartoldeman

Description

@bartoldeman

Background information

We found that using the yalla and ucx pml, message queue displays in DDT no longer works, and we need to fall back to the ob1 pml. This seems to be an ok workaround for now but with newer OpenMPI's removing the openib btl, does that mean we will need to use TCP/IP for debugging, or perhaps the new still somewhat experimental UCT BTL?

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

Tested with OpenMPI 2.1.1 (patched to work with DDT in general) with the yalla and ob1 pml, 3.1.1 and 3.1.2 with ucx and ob1 pml. Tested with DDT (Arm Forge) 7.1, 18.2, 18.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

It was compiled from a source tarball.

Please describe the system on which you are running

  • Operating system/version: CentOS 7.4
  • Computer hardware: x86_64 Skylake SP and Broadwell.
  • Network type: Infiniband Mellanox ConnectX-5.

Details of the problem

We expect to see what is show in the first screenshot (with ob1) but see the second with ucx and yalla.
The test case is a simple MPI deadlock program compiled using mpicc -g deadlock_ring.c -o deadlock_ring. The compiler (used GCC 5.4.0, 7.3.0, Intel 2016 update 4) does not matter.

screen shot 2019-03-05 at 15 05 41
screen shot 2019-03-05 at 15 00 30

/******************************************************************************
Complex deadlock bug (loop over all ranks).

Solutions:
  MPI_Sendrecv(Arr, N, MPI_INT, rank_next, tag, Arr, N, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &status);
  MPI_Sendrecv_replace(Arr, N, MPI_INT, rank_next, tag, rank_prev, tag, MPI_COMM_WORLD, &status);

******************************************************************************/
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>

// Try 1 and 10000
#define N 10000

int main (int argc, char *argv[])
{
  int numtasks, rank, tag=0, rank_prev, rank_next;
  int Arr[N];
  MPI_Status status;

  MPI_Init(&argc,&argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  printf("Task %d starting...\n",rank);

  // Neighboring tasks:
  rank_prev = rank - 1;
  rank_next = rank + 1;
  // Imposing periodic boundaries on ranks:
  if (rank_prev < 0)
    rank_prev = numtasks - 1;
  if (rank_next == numtasks)
    rank_next = 0;


  MPI_Ssend(Arr, N, MPI_INT, rank_next, tag, MPI_COMM_WORLD);
  MPI_Recv(Arr, N, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &status);


  printf ("Finished\n");

  MPI_Finalize();
}

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions