Skip to content

CloudWatch Logs

Sean Smith edited this page Apr 5, 2019 · 8 revisions

Background

Keeping track of log files from a running cluster can be a pain, some logs, such as /var/log/nodewatcher, are stored on the compute instances and disappear when compute nodes are removed. Other logs are stored on the master node, but are inaccessible once a cluster has been deleted.

This adds these logs to CloudWatch, which are accessible even after the cluster has been deleted.

/var/log/sqswatcher
/var/log/jobwatcher
/var/log/nodewatcher # for each compute node
/opt/sge/default/spool/qmaster/messages

Note: CloudWatch does incur additional minimal costs, generally < $1, see https://aws.amazon.com/cloudwatch/pricing/ for more information.

Steps

  1. Add to the Cloudwatch Template #L1674 the following additional permissions:
{
  "Sid": "CloudWatchLogs",
  "Action": [
      "logs:CreateLogGroup",
      "logs:CreateLogStream",
      "logs:PutLogEvents",
      "logs:DescribeLogStreams"
  ],
  "Effect": "Allow",
  "Resource": [
      "arn:aws:logs:*:*:*"
  ]
}
  1. Upload this new template to your S3 bucket:
$ aws s3 cp aws-parallelcluster.cfn.json s3://[your_bucket]
  1. Create a file post_install.sh with the following contents:
#!/bin/bash
########
# NOTE #
########
#
# THIS FILE IS PROVIDED AS AN EXAMPLE AND NOT INTENDED TO BE USED BESIDES TESTING
# USE IT AS AN EXAMPLE BUT NOT AS IS FOR PRODUCTION
#

# Setup the SSH authentication
ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
AZ=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
REGION="`echo \"$AZ\" | sed 's/[a-z]$//'`"

export AWS_DEFAULT_REGION=$REGION

# install and setup cloudwatch to push in logs in the local region
sudo yum install awslogs -y
cat > /etc/awslogs/awscli.conf << EOF
[plugins]
cwlogs = cwlogs
[default]
region = $REGION
EOF

# check if this is the master instance
MASTER=false
if [[ $(aws ec2 describe-instances \
            --instance-id $ID \
            --query 'Reservations[].Instances[].Tags[?Key==`Name`].Value[]' \
            --output text) = "Master" ]]; then
    MASTER=true
fi

if $MASTER; then

# Setup cloudwatch logs for master
cat >>/etc/awslogs/awslogs.conf << EOF
[/opt/sge/default/spool/qmaster/messages]
datetime_format = %b %d %H:%M:%S
file = /opt/sge/default/spool/qmaster/messages
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = pcluster-master-sge-qmaster-messages

[/var/log/jobwatcher]
datetime_format = %b %d %H:%M:%S
file = /var/log/jobwatcher
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = pcluster-master-job-watcher

[/var/log/sqswatcher]
datetime_format = %b %d %H:%M:%S
file = /var/log/sqswatcher
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = pcluster-master-sqs-watcher
EOF


else

# Setup cloudwatch logs for compute
cat >>/etc/awslogs/awslogs.conf << EOF

[/var/log/nodewatcher]
datetime_format = %b %d %H:%M:%S
file = /var/log/nodewatcher
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = pcluster-compute-node-watcher
EOF

fi

# start awslogs
sudo service awslogs start
sudo chkconfig awslogs on
  1. Upload this file to S3
$ aws s3 cp --acl public-read post_install.sh s3://[your_cluster]
  1. Create a cluster with your custom template and your post_install file:
[cluster default]
...
post_install = s3://[your_bucket]/post_install.sh
template_url = https://s3.amazonaws.com/[your_bucket]/template/aws-parallelcluster.cfn.json

Create the cluster

$ pcluster create mycluster
Status: CREATE_COMPLETE
MasterServer: RUNNING
MasterPublicIP: 18.214.13.107
ClusterUser: ec2-user
MasterPrivateIP: 172.31.18.7
  1. Now go to the CloudWatch Console > Logs section

You'll see your log files there:

pcluster-compute-node-watcher        # is /var/log/nodewatcher
pcluster-master-job-watcher          # is /var/log/jobwatcher
pcluster-master-sge-qmaster-messages # is /opt/sge/default/spool/qmaster/messages
pcluster-master-sqs-watcher          # is /var/log/sqswatcher
Clone this wiki locally