EMR terminated because all slaves in the job flow were terminated, But core and task nodes were ON DEMAND

0

EMR had 1 primary, 1 core and 5 task nodes. All 3 group of nodes were on demand (including task group). I didn't use spot purchasing for task group to avoid unexpected termination. But still EMR terminated with error : All slaves in the job flow were terminated. What could be the reason for termination? And Where can I find more log about this failure?

asked 4 months ago345 views
1 Answer
3

Hello,

There could be the chance that you have turned off the termination protection and enabled the auto termination. Possibly when the core or task node goes to unhealthy state due to the reasons like disk space utilized more than 80 %, Yarn mark those nodes unhealthy and exclude the node for scheduling the task. If this situation persist about an hour, then node will be decommissioned and terminate the instance in case the termination protection policy is disabled.

You can refer the EMR instance-state log to confirm this, if the logging enabled for the cluster. You can refer the s3://logging bucket/<cluster-id>/node/<ec2-instance>/daemons/instance-state/instance-state-log*. In this location, you can find the particular instance's OS statistics like the output of df -h, top cpu, top memory and etc, that logged every 15 minutes of the state. Let me know if you have any queries on this.

AWS
SUPPORT ENGINEER
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions