I want to push Amazon EMR application logs to Amazon CloudWatch for Amazon EMR versions 5.30.0 and later.
Short description
Use the CloudWatch agent to collect metrics and logs from Amazon Elastic Compute Cloud (Amazon EC2) instances. Then, configure Amazon EMR cluster instances to publish application logs to CloudWatch.
Resolution
Prerequisite:
Create a CloudWatch agent configuration file.
Create the configuration file
To push specific application logs from your Amazon EMR instances, use one of the following examples to create a configuration file.
Push logs for YARN application on all nodes
The following sample configuration file pushes container logs from each Amazon EMR instance:
{
"agent": {
"metrics_collection_interval": 300,
"run_as_user": "yarn"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/hadoop-yarn/containers/application_*/container*/*",
"log_group_name": "/emr/applications/",
"log_stream_name": "{instance_id}-{ip_address}",
"publish_multi_logs" : true
}
]
}
}
}
}
Note: Replace /var/log/hadoop-yarn/containers/application_*/container*/* with your file path location.
Push logs for YARN and Hive Server 2 application on the primary node
The following sample configuration file pushes Yarn resource manager logs and Hive server logs from the EMR primary node:
{
"agent": {
"metrics_collection_interval": 300,
"run_as_user": "hadoop"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/mnt/var/log/hadoop-yarn/hadoop-yarn-resourcemanager-*",
"log_group_name": "/emr/master/logs",
"log_stream_name": "{instance_id}-{ip_address}-resourcemanager.log"
},
{
"file_path": "/mnt/var/log/hive/hive-server2.log",
"log_group_name": "/emr/master/logs",
"log_stream_name": "{instance_id}-{ip_address}-hive-server2.log"
}
]
}
}
}
}
Note: Replace /mnt/var/log/hadoop-yarn/hadoop-yarn-resourcemanager-* with your file path location.
Upload the configuration file to S3 bucket
Upload the configuration file to an Amazon Simple Storage Service (Amazon S3) bucket.
Note: The Amazon EMR cluster must have AWS Identity and Access Management (IAM) permissions to access the S3 bucket.
Launch Amazon EMR
As part of the Amazon EMR bootstrap action, run the following script to configure the CloudWatch agent and start the CloudWatch agent process:
# == Install CloudWatch Agent ==
echo "=================== BootstrapActions: Install CloudWatch Agent ==================="
sudo yum install amazon-cloudwatch-agent -y
sudo amazon-linux-extras install collectd -y
# Copy config file on the instance
sudo aws s3 cp s3://<your-s3-path>/config.json /opt/aws/amazon-cloudwatch-agent/bin/config.json
# Start the agent with the created config file
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -s -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json
# Status CW Agent
echo "Status CW Agent"
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
Note: Replace s3://<your-s3-path>/config.json with the path for your environment.
(Optional) Submit a spark application
To generate example application logs, run the following command from the Amazon EMR cluster leader node to start a spark application:
spark-submit --executor-memory 1g --class org.apache.spark.examples.SparkPi /usr/lib/spark/examples/jars/spark-examples.jar 10
Use the CloudWatch console to monitor CloudWatch Logs
- Open the CloudWatch console.
- In the navigation pane, under Logs, choose Log groups.
- Select the log group that you want to view based on the configuration file.
- If you use the sample config.json file from the preceding example, then complete one of the following tasks:
To view application logs, choose /emr/applications.
To view leader logs, choose /emr/master/logs.
Note: CloudWatch event logs are automatically deleted only when you configure a retention policy on the log group. You can also configure your retention settings to optimize your cost. For more information, see Change log data retention in CloudWatch Logs.
(Optional) Use Systems Manager to install the CloudWatch agent on Amazon EMR
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.
Use AWS Systems Manager to store the agent file content. Then, refer to the Systems Manager file when you start the CloudWatch agent.
-
Use the Systems Manager console or the AWS CLI to create a Systems Manager parameter. Then, store the agent file content. The following example uses the AWS CLI to create the parameter:
aws ssm put-parameter \
--name "AmazonCloudWatch-Config.json" \
--value "{
"agent": {
"metrics_collection_interval": 300,
"run_as_user": "yarn"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/hadoop-yarn/containers/application_*/container*/*",
"log_group_name": "/emr/applications/",
"log_stream_name": "{instance_id}",
"publish_multi_logs" : true
}
]
}
}
}
}" \
--type String
Note: Replace /var/log/hadoop-yarn/containers/application_*/container*/* with your file path location. In the preceding example, the configuration file pushes container logs from each Amazon EMR instance. For more information on System Manager parameters, see Creating Systems Manager parameters.
-
Update the bootstrap action script to refer to the parameter:
# Start the agent with the created config file
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -s -m ec2 -c ssm:AmazonCloudWatch-Config.json
Related information
Conditionally run a bootstrap action
View log files on the primary node
Running the CloudWatch agent as a different user
View log data sent to CloudWatch Logs
How can I collect custom metrics from Amazon EMR cluster instances and monitor them in CloudWatch?