EMR terminated because all slaves in the job flow were terminated, But core and task nodes were ON DEMAND

0

EMR had 1 primary, 1 core and 5 task nodes. All 3 group of nodes were on demand (including task group). I didn't use spot purchasing for task group to avoid unexpected termination. But still EMR terminated with error : All slaves in the job flow were terminated. What could be the reason for termination? And Where can I find more log about this failure?

feita há 4 meses383 visualizações
1 Resposta
3

Hello,

There could be the chance that you have turned off the termination protection and enabled the auto termination. Possibly when the core or task node goes to unhealthy state due to the reasons like disk space utilized more than 80 %, Yarn mark those nodes unhealthy and exclude the node for scheduling the task. If this situation persist about an hour, then node will be decommissioned and terminate the instance in case the termination protection policy is disabled.

You can refer the EMR instance-state log to confirm this, if the logging enabled for the cluster. You can refer the s3://logging bucket/<cluster-id>/node/<ec2-instance>/daemons/instance-state/instance-state-log*. In this location, you can find the particular instance's OS statistics like the output of df -h, top cpu, top memory and etc, that logged every 15 minutes of the state. Let me know if you have any queries on this.

AWS
ENGENHEIRO DE SUPORTE
respondido há 4 meses

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas