EMR Serverless change deploy mode

0

I'm new to EMR serverless. i found out that EMR Serverless uses client mode as default deploy mode. and there's no informations to use cluster mode in emr serverless.

is there any way to use 'cluster mode' in EMR Serverless? if not, why can't use cluster mode in EMR Serverless?

  • Is there a reason you want to change the deploy mode? (As I understand it), tis is useful when you want to run your driver on the worker nodes in YARN mode, but EMR Serverless has its own resource manager so not sure if this is relevant.

已提問 9 個月前檢視次數 319 次
1 個回答
1
已接受的答案

Hello

To answer your question: per my understanding there is no way to change the deploy mode from 'cluster' in EMR Serverless. The reason is that there isn't really a difference between the two modes on EMR Serverless. In a regular EMR cluster, there is a master node from the end user's point of view, but on EMR Serverless there is no real 'master node'. To explain what I mean, let me put it this way:

Consider the example of running a Spark job on EMR on EC2 cluster by running the 'spark-submit' command as a step (which is practically identical to running this directly on the master node). In this scenario, the deploy mode of the Spark job has a direct impact on how the job will run since we have an awareness of the underlying architecture of the cluster as well as the daemons running on each specific node. Using 'client-mode' will cause the Spark driver container to run on the master node, as such resources on this node (Memory and CPU) will be used by this container. If enough jobs are running in client mode, this can cause the Master Node to move resources away from crucial daemons such as the YARN ResourceManager and HDFS DataNode daemons to run the Spark driver containers which in turn can cause cluster issues (eg: YARN ResourceManager restart). For this reason it is considered a better practice to run these jobs in cluster mode (in most cases).

Comparing this with EMR Serverless, there is no 'master' node. Everything is abstracted away and from the end user point of view each node can be considered to be identical. As such, where the driver runs makes no difference because we don't need to take into account other daemons like we did in EMR on EC2 and the job will run the same with either configuration.

Please let me know if this explanation helps or if you have any further questions.

profile pictureAWS
支援工程師
Tim-AWS
已回答 8 個月前
profile picture
專家
已審閱 7 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南