EMR terminated because all slaves in the job flow were terminated, But core and task nodes were ON DEMAND


EMR had 1 primary, 1 core and 5 task nodes. All 3 group of nodes were on demand (including task group). I didn't use spot purchasing for task group to avoid unexpected termination. But still EMR terminated with error : All slaves in the job flow were terminated. What could be the reason for termination? And Where can I find more log about this failure?

demandé il y a 4 mois395 vues
1 réponse


There could be the chance that you have turned off the termination protection and enabled the auto termination. Possibly when the core or task node goes to unhealthy state due to the reasons like disk space utilized more than 80 %, Yarn mark those nodes unhealthy and exclude the node for scheduling the task. If this situation persist about an hour, then node will be decommissioned and terminate the instance in case the termination protection policy is disabled.

You can refer the EMR instance-state log to confirm this, if the logging enabled for the cluster. You can refer the s3://logging bucket/<cluster-id>/node/<ec2-instance>/daemons/instance-state/instance-state-log*. In this location, you can find the particular instance's OS statistics like the output of df -h, top cpu, top memory and etc, that logged every 15 minutes of the state. Let me know if you have any queries on this.

répondu il y a 4 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions