Open
Description
Bug description
We observed an issue concerning mpirun
and cpuinfo
commands from Intel MPI 2021.4 distributed with AWS ParallelCluster since ver. 3.1.1, hanging indefinitely on Ubuntu 20.04 with latest updates and kernel 5.15.0-1015.
The issue can be verified by a simple mpirun --version
: the process gets stuck consuming 100% of a CPU core.
The problem is likely to happen on EC2 instance types with more than 1 socket as c5.18xlarge.
Workaround
To avoid the issue, we recommend to either use the official AWS ParallelCluster AMIs and software stack, thus avoiding a custom AMI with the latest software updates for Ubuntu 20.04 including kernel 5.15.0-1015, or use Open MPI.