EMR terminated because all slaves in the job flow were terminated, But core and task nodes were ON DEMAND

0

EMR had 1 primary, 1 core and 5 task nodes. All 3 group of nodes were on demand (including task group). I didn't use spot purchasing for task group to avoid unexpected termination. But still EMR terminated with error : All slaves in the job flow were terminated. What could be the reason for termination? And Where can I find more log about this failure?

質問済み 4ヶ月前381ビュー
1回答
3

Hello,

There could be the chance that you have turned off the termination protection and enabled the auto termination. Possibly when the core or task node goes to unhealthy state due to the reasons like disk space utilized more than 80 %, Yarn mark those nodes unhealthy and exclude the node for scheduling the task. If this situation persist about an hour, then node will be decommissioned and terminate the instance in case the termination protection policy is disabled.

You can refer the EMR instance-state log to confirm this, if the logging enabled for the cluster. You can refer the s3://logging bucket/<cluster-id>/node/<ec2-instance>/daemons/instance-state/instance-state-log*. In this location, you can find the particular instance's OS statistics like the output of df -h, top cpu, top memory and etc, that logged every 15 minutes of the state. Let me know if you have any queries on this.

AWS
サポートエンジニア
回答済み 4ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ