EMR Serverless change deploy mode

0

I'm new to EMR serverless. i found out that EMR Serverless uses client mode as default deploy mode. and there's no informations to use cluster mode in emr serverless.

is there any way to use 'cluster mode' in EMR Serverless? if not, why can't use cluster mode in EMR Serverless?

  • Is there a reason you want to change the deploy mode? (As I understand it), tis is useful when you want to run your driver on the worker nodes in YARN mode, but EMR Serverless has its own resource manager so not sure if this is relevant.

已提问 9 个月前319 查看次数
1 回答
1
已接受的回答

Hello

To answer your question: per my understanding there is no way to change the deploy mode from 'cluster' in EMR Serverless. The reason is that there isn't really a difference between the two modes on EMR Serverless. In a regular EMR cluster, there is a master node from the end user's point of view, but on EMR Serverless there is no real 'master node'. To explain what I mean, let me put it this way:

Consider the example of running a Spark job on EMR on EC2 cluster by running the 'spark-submit' command as a step (which is practically identical to running this directly on the master node). In this scenario, the deploy mode of the Spark job has a direct impact on how the job will run since we have an awareness of the underlying architecture of the cluster as well as the daemons running on each specific node. Using 'client-mode' will cause the Spark driver container to run on the master node, as such resources on this node (Memory and CPU) will be used by this container. If enough jobs are running in client mode, this can cause the Master Node to move resources away from crucial daemons such as the YARN ResourceManager and HDFS DataNode daemons to run the Spark driver containers which in turn can cause cluster issues (eg: YARN ResourceManager restart). For this reason it is considered a better practice to run these jobs in cluster mode (in most cases).

Comparing this with EMR Serverless, there is no 'master' node. Everything is abstracted away and from the end user point of view each node can be considered to be identical. As such, where the driver runs makes no difference because we don't need to take into account other daemons like we did in EMR on EC2 and the job will run the same with either configuration.

Please let me know if this explanation helps or if you have any further questions.

profile pictureAWS
支持工程师
Tim-AWS
已回答 8 个月前
profile picture
专家
已审核 7 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则