EMR terminated because all slaves in the job flow were terminated, But core and task nodes were ON DEMAND

0

EMR had 1 primary, 1 core and 5 task nodes. All 3 group of nodes were on demand (including task group). I didn't use spot purchasing for task group to avoid unexpected termination. But still EMR terminated with error : All slaves in the job flow were terminated. What could be the reason for termination? And Where can I find more log about this failure?

preguntada hace 4 meses380 visualizaciones
1 Respuesta
3

Hello,

There could be the chance that you have turned off the termination protection and enabled the auto termination. Possibly when the core or task node goes to unhealthy state due to the reasons like disk space utilized more than 80 %, Yarn mark those nodes unhealthy and exclude the node for scheduling the task. If this situation persist about an hour, then node will be decommissioned and terminate the instance in case the termination protection policy is disabled.

You can refer the EMR instance-state log to confirm this, if the logging enabled for the cluster. You can refer the s3://logging bucket/<cluster-id>/node/<ec2-instance>/daemons/instance-state/instance-state-log*. In this location, you can find the particular instance's OS statistics like the output of df -h, top cpu, top memory and etc, that logged every 15 minutes of the state. Let me know if you have any queries on this.

AWS
INGENIERO DE SOPORTE
respondido hace 4 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas