How can I push Amazon EMR application logs to CloudWatch?

4 minute read
0

I want to push Amazon EMR application logs to Amazon CloudWatch for Amazon EMR versions 5.30.0 and later.

Short description

Use the CloudWatch agent to collect metrics and logs from Amazon Elastic Compute Cloud (Amazon EC2) instances. Then, configure Amazon EMR cluster instances to publish application logs to CloudWatch.

Resolution

Prerequisite:

Create a CloudWatch agent configuration file.

Create the configuration file

To push specific application logs from your Amazon EMR instances, use one of the following examples to create a configuration file.

Push logs for YARN application on all nodes

The following sample configuration file pushes container logs from each Amazon EMR instance:

{
    "agent": {
            "metrics_collection_interval": 300,
            "run_as_user": "yarn"
    },
    "logs": {
        "logs_collected": {
            "files": {
                "collect_list": [
                    {
                    "file_path": "/var/log/hadoop-yarn/containers/application_*/container*/*",
                     "log_group_name": "/emr/applications/",
                     "log_stream_name": "{instance_id}-{ip_address}",
                     "publish_multi_logs" : true
                    }
                ]
            }
        }
    }
}

Note: Replace /var/log/hadoop-yarn/containers/application_*/container*/* with your file path location.

Push logs for YARN and Hive Server 2 application on the primary node

The following sample configuration file pushes Yarn resource manager logs and Hive server logs from the EMR primary node:

{
    "agent": {
            "metrics_collection_interval": 300,
            "run_as_user": "hadoop"
    },
    "logs": {
        "logs_collected": {
            "files": {
                "collect_list": [
                    {
                    "file_path": "/mnt/var/log/hadoop-yarn/hadoop-yarn-resourcemanager-*",
                    "log_group_name": "/emr/master/logs",
                    "log_stream_name": "{instance_id}-{ip_address}-resourcemanager.log"
                    },
                    {
                    "file_path": "/mnt/var/log/hive/hive-server2.log",
                    "log_group_name": "/emr/master/logs",
                    "log_stream_name": "{instance_id}-{ip_address}-hive-server2.log"
                    }
                ]
            }
        }
    }
}

Note: Replace /mnt/var/log/hadoop-yarn/hadoop-yarn-resourcemanager-* with your file path location.

Upload the configuration file to S3 bucket

Upload the configuration file to an Amazon Simple Storage Service (Amazon S3) bucket.

Note: The Amazon EMR cluster must have AWS Identity and Access Management (IAM) permissions to access the S3 bucket.

Launch Amazon EMR

As part of the Amazon EMR bootstrap action, run the following script to configure the CloudWatch agent and start the CloudWatch agent process:

# == Install CloudWatch Agent ==
echo "=================== BootstrapActions: Install CloudWatch Agent ==================="

sudo yum install amazon-cloudwatch-agent -y
sudo amazon-linux-extras install collectd -y

# Copy config file on the instance
sudo aws s3 cp s3://<your-s3-path>/config.json /opt/aws/amazon-cloudwatch-agent/bin/config.json

# Start the agent with the created config file
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -s -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json

# Status CW Agent
echo "Status CW Agent"
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status

Note: Replace s3://<your-s3-path>/config.json with the path for your environment.

(Optional) Submit a spark application

To generate example application logs, run the following command from the Amazon EMR cluster leader node to start a spark application:

spark-submit --executor-memory 1g --class org.apache.spark.examples.SparkPi /usr/lib/spark/examples/jars/spark-examples.jar 10

Use the CloudWatch console to monitor CloudWatch Logs

  1. Open the CloudWatch console.
  2. In the navigation pane, under Logs, choose Log groups.
  3. Select the log group that you want to view based on the configuration file.
  4. If you use the sample config.json file from the preceding example, then complete one of the following tasks:
    To view application logs, choose /emr/applications.
    To view leader logs, choose /emr/master/logs.

Note: CloudWatch event logs are automatically deleted only when you configure a retention policy on the log group. You can also configure your retention settings to optimize your cost. For more information, see Change log data retention in CloudWatch Logs.

(Optional) Use Systems Manager to install the CloudWatch agent on Amazon EMR

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.

Use AWS Systems Manager to store the agent file content. Then, refer to the Systems Manager file when you start the CloudWatch agent.

  1. Use the Systems Manager console or the AWS CLI to create a Systems Manager parameter. Then, store the agent file content. The following example uses the AWS CLI to create the parameter:

    aws ssm put-parameter \
         --name "AmazonCloudWatch-Config.json" \
         --value "{
                     "agent": {
                             "metrics_collection_interval": 300,
                             "run_as_user": "yarn"
                     },
                     "logs": {
                        "logs_collected": {
                            "files": {
                                "collect_list": [
                                    {
                                    "file_path": "/var/log/hadoop-yarn/containers/application_*/container*/*",
                                    "log_group_name": "/emr/applications/",
                                    "log_stream_name": "{instance_id}",
                                    "publish_multi_logs" : true
                                    }
                                ]
                            }
                        }
                     }
             }" \
         --type String

    Note: Replace /var/log/hadoop-yarn/containers/application_*/container*/* with your file path location. In the preceding example, the configuration file pushes container logs from each Amazon EMR instance. For more information on System Manager parameters, see Creating Systems Manager parameters.

  2. Update the bootstrap action script to refer to the parameter:

    # Start the agent with the created config file
    sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -s -m ec2 -c ssm:AmazonCloudWatch-Config.json

Related information

Conditionally run a bootstrap action

View log files on the primary node

Running the CloudWatch agent as a different user

View log data sent to CloudWatch Logs

How can I collect custom metrics from Amazon EMR cluster instances and monitor them in CloudWatch?

AWS OFFICIAL
AWS OFFICIALUpdated 5 months ago