-
Notifications
You must be signed in to change notification settings - Fork 315
Possible performance degradation on ALinux2 when using ParallelCluster 2.11.0 and custom AMIs from 2.6.0 to 2.11.0
Enrico Usai edited this page Jul 8, 2021
·
11 revisions
The performance of tightly coupled / MPI workloads on clusters with Amazon Linux 2 operating system may be impacted by enabling CloudWatch logging.
Our preliminary analysis has found this is likely related to the CloudWatch Agent version 1.247348.0b251302, you can check which version you have installed by running the command: yum list amazon-cloudwatch-agent
This performance issue may affect workloads differently depending on cluster size and applications used.
To overcome the issue there are multiple options.
Create a cluster with the following configuration
[cluster yourcluster]
cw_log_settings = custom-cw
...
[cw_log custom-cw]
enable = false
CloudWatch logging and the CloudWatch Agent service will be disabled by default, avoiding the possible performance degradation issue.
- Follow the official documentation to modify an existing ParallelCluster AMI
- As part of the AMI customization step, connect to the instance and run the following command:
sudo yum downgrade amazon-cloudwatch-agent-1.247347.4-1.amzn2
- Complete the steps to create a custom AMI
- Create a cluster using the generated AMI with the
custom_ami
parameter.
Example:
#!/bin/bash
#SBATCH --job-name=yourjob
# add your options
# downgrade
for i in $(scontrol show hostnames $SLURM_JOB_NODELIST)
do
ssh $i "sudo systemctl stop amazon-cloudwatch-agent.service"
ssh $i "sudo yum -y downgrade amazon-cloudwatch-agent-1.247347.4-1.amzn2"
ssh $i "sudo systemctl start amazon-cloudwatch-agent.service"
done
# start your application
sleep 100