Knowledge Center Monthly Newsletter - July 2025
Stay up to date with the latest from the Knowledge Center. See all new Knowledge Center articles published in the last month, and re:Post’s top contributors.
How do I publish and monitor an Amazon EMR application status with CloudWatch integration?
I want to integrate Amazon CloudWatch with Amazon EMR to publish and monitor the statuses of applications that I installed on my cluster. I want CloudWatch to alert me when applications are down.
Short description
When you integrate CloudWatch with Amazon EMR, you can track critical statuses for applications that you installed, such as HiveServer2 and YARN ResourceManager. Then, you can publish the statuses to CloudWatch custom metrics and configure alerts for service unavailability. To track additional applications, you can modify the application list as needed.
Resolution
Prerequisites:
- Amazon EMR version 5.30.0 or later
- Amazon EMR instance profile role or an AWS Identity and Access Management (IAM) user role with cloudwatch:PutMetricData permissions
Create a script to monitor your Amazon EMR applications
You can create a script to monitor your Amazon EMR applications. The following example script that's named check_process.sh monitors YARN ResourceManager and HiveServer2 on a primary node. The script also monitors YARN NodeManager on core and task worker nodes. To monitor additional applications, you can modify applications under the # Monitor specific services section in the script.
To configure the following script to include additional applications, see Create bootstrap actions to install additional software with an Amazon EMR cluster.
Example script:
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. #!/bin/bash # Set up logging LOG_FILE="/var/log/hadoop/service-monitor-detailed.log" LOG_STATUS_FILE="/var/log/hadoop/service-monitor-status.log" TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S') CLUSTERID=$(jq -r ".jobFlowId" < /emr/instance-controller/lib/info/extraInstanceData.json) INSTANCEID=$(ec2-metadata -i | cut -d " " -f 2) HOSTIP=$(hostname -i) NODETYPE=$(cat /mnt/var/lib/instance-controller/extraInstanceData.json | jq -r '.instanceRole' | awk '{print toupper(substr($0,1,1)) tolower(substr($0,2))}') # Function to log messages log_message() { echo "$TIMESTAMP - $1" >> "$LOG_FILE" echo "$TIMESTAMP - $1" } log_status_message() { echo "$TIMESTAMP - $1" >> "$LOG_STATUS_FILE" } # Function to send metric to CloudWatch send_to_cloudwatch() { local host_ip=$1 local service_name=$2 local status=$3 aws cloudwatch put-metric-data \ --namespace "EMR/ServiceStatus" \ --metric-name "ServiceStatus" \ --value "$status" \ --unit "Count" \ --dimensions ClusterId=$CLUSTERID,NodeServiceName=$service_name,InstanceId=$INSTANCEID,NodeType=$NODETYPE \ --timestamp "$(date -u +"%Y-%m-%dT%H:%M:%SZ")" \ --region "${AWS_REGION:-us-east-1}" || { log_message "ERROR: Failed to send metric for service $service_name" return 1 } log_message "Successfully sent metric for service: $service_name (Status: $status)" } # Create log file if it doesn't exist touch "$LOG_FILE" touch "$LOG_STATUS_FILE" log_message "Starting service monitoring..." # Monitor specific services services=( "hive-server2" "hadoop-yarn-resourcemanager" "hadoop-yarn-nodemanager" ) service_names=( "HiveServer2" "YARN-ResourceManager" "YARN-NodeManager" ) for i in "${!services[@]}"; do # Check if service is disabled as not all services are running on all nodes if systemctl is-enabled "${services[$i]}" 2>/dev/null | grep -q "disabled"; then log_message "$CLUSTERID $INSTANCEID $HOSTIP $NODETYPE ${service_names[$i]}-Status DISABLED (ignored)" continue fi # Get service status status_output=$(systemctl status "${services[$i]}" 2>/dev/null) # Extract the process status process_status=$(echo "$status_output" | grep "Active:" | sed -E 's/Active: ([^ ]+) .*/\1/' | xargs) # Log message log_message "$CLUSTERID $INSTANCEID $HOSTIP $NODETYPE ${service_names[$i]}-Status $process_status" log_status_message "$CLUSTERID $INSTANCEID $HOSTIP $NODETYPE ${service_names[$i]}-Status $process_status" # Convert status to numeric value for CloudWatch status_value=0 if [ "$process_status" != "active" ]; then status_value=1 # Send to CloudWatch send_to_cloudwatch "$HOSTIP" "${service_names[$i]}" "$status_value" fi done log_message "Service monitoring completed." exit 0
Important: Before you run the script in a production environment, it's a best practice to test the script in a test environment.
The preceding script publishes custom metrics to CloudWatch. AWS prorates all custom metric charges by the hour and meters them only when the script sends the metrics to CloudWatch. For more information, see Amazon CloudWatch pricing.
Configure service monitoring on your Amazon EMR cluster
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
To implement automated service monitoring, use a bootstrap action script.
Complete the following steps:
-
To prepare the script, run the following cp AWS CLI command to upload the script to an Amazon Simple Storage Service (Amazon S3) bucket that your Amazon EMR cluster can access:
aws s3 cp check_process.sh s3://your-bucket/monitoring/check_process.sh
-
To copy the script to each cluster node and use crontab to schedule the script, create a bootstrap action script that's similar to the following example:
#!/bin/bash # Copy monitoring script from S3 aws s3 cp s3://your-bucket/monitoring/check_process.sh /home/hadoop/ chmod +x /home/hadoop/check_process.sh # Add to crontab (crontab -l 2>/dev/null; echo "*/5 * * * * /home/hadoop/check_process.sh") | crontab -
Note: Modify the crontab duration to meet your requirements.
-
Add the bootstrap action script to the Amazon EMR cluster configuration file.
Note: The S3 bucket must have minimum required permissions to access the scripts.
After you launch the cluster, run the following command on your cluster nodes to confirm that Amazon EMR correctly copied the script:
ls -l /home/hadoop/check_process.sh
To confirm that you correctly configured crontab, run the following command on the cluster nodes:
crontab -l
Review the logs
The script generates detailed logs and status logs on cluster nodes. To verify that the script works correctly, review both logs.
Detailed logs
The /var/log/hadoop/service-monitor-detailed.log file provides comprehensive logs with timestamps, cluster ID, instance ID, host IP address, node type, and service status.
Example file:
2025-05-06 23:07:01 - Starting service monitoring... 2025-05-06 23:07:01 - j-1O1234567890 i-0a6871234567890 111.xx.xx.92 Master HiveServer2-Status inactive 2025-05-06 23:07:01 - Successfully sent metric for service: HiveServer2 (Status: 1) 2025-05-06 23:07:01 - j-1O1234567890 i-0a6871234567890 111.xx.xx.92 Master YARN-ResourceManager-Status active 2025-05-06 23:07:01 - Service monitoring completed.
Status logs
The /var/log/hadoop/service-monitor-status.log file contains records of the service status without the additional metadata.
Example file:
2025-05-06 23:07:01 - j-1O1234567890 i-0a6871234567890 111.xx.xx.92 Master HiveServer2-Status inactive 2025-05-06 23:07:01 - j-1O1234567890 i-0a6871234567890 111.xx.xx.92 Master YARN-ResourceManager-Status active 2025-05-06 23:08:01 - j-1O1234567890 i-0a6871234567890 111.xx.xx.92 Master HiveServer2-Status inactive 2025-05-06 23:08:01 - j-1O1234567890 i-0a6871234567890 111.xx.xx.92 Master YARN-ResourceManager-Status failed 2025-05-06 23:09:01 - j-1O1234567890 i-0a6871234567890 111.xx.xx.92 Master HiveServer2-Status inactive 2025-05-06 23:09:01 - j-1O1234567890 i-0a6871234567890 111.xx.xx.92 Master YARN-ResourceManager-Status failed
Use CloudWatch to monitor application metrics
The script sends metrics to CloudWatch when an application is down.
To monitor the metrics, complete the following steps:
- Open the CloudWatch console.
- In the navigation pane, under Metrics, choose All metrics.
- Under Metrics, choose EMR/ServiceStatus, and then select the ServiceStatus metric.
- Filter the metrics by the available dimensions: ClusterId, InstanceId, NodeServiceName, and NodeType.
Related information
Create a CloudWatch alarm based on a static threshold
View and restart Amazon EMR and application processes (daemons)
- Topics
- Analytics
- Tags
- Amazon EMR
- Language
- English

Relevant content
- asked 2 months ago
- Accepted Answerasked 3 years ago
- AWS OFFICIALUpdated 2 years ago