Why does my Amazon EMR cluster terminate with an "application provisioning failed" error?
My Amazon EMR cluster terminates with an "application provisioning failed" error.
Resolution
When Amazon EMR can't install, configure, or start a specified software when it launches an Amazon EMR cluster, you might receive the "application provisioning failed" error.
Review the Amazon EMR provisioning logs
Amazon EMR stores provisioning logs in an Amazon Simple Storage Service (Amazon S3) bucket that you specify when you launch the cluster.
Complete the following steps:
- Open the Amazon EMR console.
- In the navigation pane, choose Clusters. Then, choose the failed Amazon EMR cluster to see the cluster details.
- In the Summary section, choose Terminated with errors and note the primary node ID included in the error message.
- In the Cluster logs section, choose the Amazon S3 location URL.
- Follow the following path to navigate to your UUID folder:
node/example-primary-node-ID/provision-node/apps-phase/0/example-UUID/.
Note: Replace example-primary-node-ID with your primary node ID. Replace example-UUID with your UUID. - In the resulting list, select puppet.log.gz and choose Open to see the provisioning in a new browser tab.
Identify the reasons for failures in provisioning logs
Unsupported configuration parameters can cause errors. Incorrect hostnames, incorrect passwords, or general operating system issues can also cause errors. Search logs for related keywords, including "error", "err" or "fail."
Issues when you connect to an external metastore with an Amazon RDS instance
You can configure some Amazon EMR applications, such as Apache Hive, Hue, or Apache Oozie to store data in an external database, such as Amazon Relational Database Service (Amazon RDS). When you experience connection problems with the external database, you receive an error message.
Example error message from Hive:
2022-11-26 02:59:36 +0000 /Stage[main]/Hadoop_hive::Init_metastore_schema/Exec[init hive-metastore schema]/returns (notice): org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version. 2022-11-26 02:59:36 +0000 /Stage[main]/Hadoop_hive::Init_metastore_schema/Exec[init hive-metastore schema]/returns (notice): Underlying cause: java.sql.SQLNonTransientConnectionException : Could not connect to address=(host=hostname)(port=3306)(type=master) : Socket fail to connect to host:hostname, port:3306. hostname 2022-11-26 02:59:36 +0000 /Stage[main]/Hadoop_hive::Init_metastore_schema/Exec[init hive-metastore schema]/returns (notice): SQL Error code: -1
To resolve this type of error, take the following actions:
- Verify that the Amazon RDS instance hostname, user, password, and database are correct.
- Verify that the Amazon RDS instance security group inbound rules allow connections from the Amazon EMR primary node security group.
Issues when you connect to an external KDC
Amazon EMR lets you configure an external KDC to add an additional layer of security. You can also create a trust relationship with an Active Directory server. If there's an issue when you contact the KDC or try to join a domain, you receive an error message.
Example error message from Puppet:
2022-11-26 03:02:01 +0000 Puppet (err): 'echo "${AD_DOMAIN_JOIN_PASSWORD}" | realm join -v -U "${AD_DOMAIN_JOIN_USER}"@"${CROSS_REALM_TRUST_REALM}" "${CROSS_REALM_TRUST_DOMAIN}"' returned 1 instead of one of [0] 2022-11-26 03:02:01 +0000 /Stage[main]/Kerberos::Ad_joiner/Exec[realm_join]/returns (err): change from 'notrun' to ['0'] failed: 'echo "${AD_DOMAIN_JOIN_PASSWORD}" | realm join -v -U "${AD_DOMAIN_JOIN_USER}"@"${CROSS_REALM_TRUST_REALM}" "${CROSS_REALM_TRUST_DOMAIN}"' returned 1 instead of one of [0]
To resolve this type of error, take the following actions:
- Check if you spelled the Kerberos realm correctly.
- Check if you entered the KDC administrative password correctly.
- Check if you spelled the Active Directory join user and password correctly .
- Verify that Active Directory contains the join user and that the user has the correct permissions.
- For KDC and Active Directory hosted on Amazon Elastic Compute Cloud (Amazon EC2), verify that the KDC and Active Directory security group inbound rules allow connections from the Amazon EMR primary node security group.
- For KDC and Active Directory hosted outside of Amazon EC2, verify that KDC and Active Directory allow connections from the Amazon EMR cluster virtual private cloud (VPC) and subnet.
Issues when you start services, such as YARN ResourceManager, Hadoop NameNode, or Spark History Server
Amazon EMR lets you create custom configuration of all applications when you launch an Amazon EMR cluster. But these configurations can sometimes block services from their start process. When the service can't start because of an issue, you receive an error message.
Example error message from Apache Spark History Server:
2022-11-26 03:34:13 +0000 Puppet (err): Systemd start for spark-history-server failed!journalctl log for spark-history-server: -- Logs begin at Sat 2022-11-26 03:27:57 UTC, end at Sat 2022-11-26 03:34:13 UTC. -- Nov 26 03:34:10 ip-192-168-1-32 systemd[1]: Starting Spark history-server... Nov 26 03:34:10 ip-192-168-1-32 spark-history-server[1076]: Starting Spark history-server (spark-history-server):[OK] Nov 26 03:34:10 ip-192-168-1-32 su[1112]: (to spark) root on none Nov 26 03:34:13 ip-192-168-1-32 systemd[1]: spark-history-server.service: control process exited, code=exited status=1 Nov 26 03:34:13 ip-192-168-1-32 systemd[1]: Failed to start Spark history-server. Nov 26 03:34:13 ip-192-168-1-32 systemd[1]: Unit spark-history-server.service entered failed state. Nov 26 03:34:13 ip-192-168-1-32 systemd[1]: spark-history-server.service failed. 2022-11-26 03:34:13 +0000 /Stage[main]/Spark::History_server/Service[spark-history-server]/ensure (err): change from 'stopped' to 'running' failed: Systemd start for spark-history-server failed! journalctl log for spark-history-server:
To resolve this type of error, take the following actions:
- Check which service failed to start.
- Review your configuration settings for spelling errors.
- Check the Amazon S3 log at the specified location to find the cause of the failure. For example, s3://example-log-location/example-cluster-ID/node/example-primary-node-ID/applications/example-failed-application/example-failed-service.gz.
Issues when you download or install applications
When Amazon EMR can't install or download an application, the Amazon EMR cluster fails and the provisioning logs don't complete. Review the stderr.gz log to identify what caused the error.
Example error message:
stderr.gzError Summary ------------- Disk Requirements: At least 2176MB more space needed on the / filesystem. 2022-11-26 03:18:44,662 ERROR Program: Encountered a problem while provisioning java.lang.RuntimeException: Amazon-linux-extras topics enabling or yum packages installation failed.
To resolve this type of error, increase the root Amazon Elastic Block Store (Amazon EBS) volume when you launch your Amazon EMR cluster.
Amazon S3 logs aren't available
When Amazon EMR fails to provision applications, and there aren't any logs generated in Amazon S3, you receive an error message. A network error might have caused Amazon S3 logging to fail.
To resolve this type of error, take the following actions:
- Check whether you turned on the Logging option when you launched the Amazon EMR. For more information, see Configure Amazon EMR cluster logging and debugging.
- If you use a custom AMI, then check for firewall rules that might interfere with the required Amazon EMR network settings. For more information, see Working with Amazon EMR-managed security groups.
- If you use a custom AMI, then check for failed primary nodes. Open the Amazon EMR console, and in the navigation pane, choose Hardware to see whether the clusters can launch any primary nodes.
- If you use a custom AMI, then make sure that you follow best practices. For more information, see Using a custom AMI to provide more flexibility for Amazon EMR cluster configuration.
Related information
- Topics
- Analytics
- Tags
- Amazon EMR
- Language
- English

Relevant content
- asked 2 years ago
AWS OFFICIALUpdated a month ago