-
Notifications
You must be signed in to change notification settings - Fork 314
CloudWatch Logs
Sean Smith edited this page Apr 5, 2019
·
8 revisions
Keeping track of log files from a running cluster can be a pain, some logs, such as /var/log/nodewatcher
, are stored on the compute instances and disappear when compute nodes are removed. Other logs are stored on the master node, but are inaccessible once a cluster has been deleted.
This adds these logs to CloudWatch, which are accessible even after the cluster has been deleted.
/var/log/sqswatcher
/var/log/jobwatcher
/var/log/nodewatcher # for each compute node
/opt/sge/default/spool/qmaster/messages
Note: CloudWatch does incur additional minimal costs, generally < $1, see https://aws.amazon.com/cloudwatch/pricing/ for more information.
- Add to the Cloudwatch Template #L1674 the following additional permissions:
{
"Sid": "CloudWatchLogs",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogStreams"
],
"Effect": "Allow",
"Resource": [
"arn:aws:logs:*:*:*"
]
}
- Upload this new template to your S3 bucket:
$ aws s3 cp aws-parallelcluster.cfn.json s3://[your_bucket]
- Create a file
post_install.sh
with the following contents:
#!/bin/bash
########
# NOTE #
########
#
# THIS FILE IS PROVIDED AS AN EXAMPLE AND NOT INTENDED TO BE USED BESIDES TESTING
# USE IT AS AN EXAMPLE BUT NOT AS IS FOR PRODUCTION
#
# Setup the SSH authentication
ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
AZ=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
REGION="`echo \"$AZ\" | sed 's/[a-z]$//'`"
export AWS_DEFAULT_REGION=$REGION
# install and setup cloudwatch to push in logs in the local region
sudo yum install awslogs -y
cat > /etc/awslogs/awscli.conf << EOF
[plugins]
cwlogs = cwlogs
[default]
region = $REGION
EOF
# check if this is the master instance
MASTER=false
if [[ $(aws ec2 describe-instances \
--instance-id $ID \
--query 'Reservations[].Instances[].Tags[?Key==`Name`].Value[]' \
--output text) = "Master" ]]; then
MASTER=true
fi
if $MASTER; then
# Setup cloudwatch logs for master
cat >>/etc/awslogs/awslogs.conf << EOF
[/opt/sge/default/spool/qmaster/messages]
datetime_format = %b %d %H:%M:%S
file = /opt/sge/default/spool/qmaster/messages
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = pcluster-master-sge-qmaster-messages
[/var/log/jobwatcher]
datetime_format = %b %d %H:%M:%S
file = /var/log/jobwatcher
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = pcluster-master-job-watcher
[/var/log/sqswatcher]
datetime_format = %b %d %H:%M:%S
file = /var/log/sqswatcher
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = pcluster-master-sqs-watcher
EOF
else
# Setup cloudwatch logs for compute
cat >>/etc/awslogs/awslogs.conf << EOF
[/var/log/nodewatcher]
datetime_format = %b %d %H:%M:%S
file = /var/log/nodewatcher
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = pcluster-compute-node-watcher
EOF
fi
# start awslogs
sudo service awslogs start
sudo chkconfig awslogs on
- Upload this file to S3
$ aws s3 cp --acl public-read post_install.sh s3://[your_cluster]
- Create a cluster with your custom template and your post_install file:
[cluster default]
...
post_install = s3://[your_bucket]/post_install.sh
template_url = https://s3.amazonaws.com/[your_bucket]/template/aws-parallelcluster.cfn.json
Create the cluster
$ pcluster create mycluster
Status: CREATE_COMPLETE
MasterServer: RUNNING
MasterPublicIP: 18.214.13.107
ClusterUser: ec2-user
MasterPrivateIP: 172.31.18.7
- Now go to the CloudWatch Console > Logs section
You'll see your log files there:
pcluster-compute-node-watcher # is /var/log/nodewatcher
pcluster-master-job-watcher # is /var/log/jobwatcher
pcluster-master-sge-qmaster-messages # is /opt/sge/default/spool/qmaster/messages
pcluster-master-sqs-watcher # is /var/log/sqswatcher