EMR Serverless change deploy mode

0

I'm new to EMR serverless. i found out that EMR Serverless uses client mode as default deploy mode. and there's no informations to use cluster mode in emr serverless.

is there any way to use 'cluster mode' in EMR Serverless? if not, why can't use cluster mode in EMR Serverless?

  • Is there a reason you want to change the deploy mode? (As I understand it), tis is useful when you want to run your driver on the worker nodes in YARN mode, but EMR Serverless has its own resource manager so not sure if this is relevant.

질문됨 9달 전319회 조회
1개 답변
1
수락된 답변

Hello

To answer your question: per my understanding there is no way to change the deploy mode from 'cluster' in EMR Serverless. The reason is that there isn't really a difference between the two modes on EMR Serverless. In a regular EMR cluster, there is a master node from the end user's point of view, but on EMR Serverless there is no real 'master node'. To explain what I mean, let me put it this way:

Consider the example of running a Spark job on EMR on EC2 cluster by running the 'spark-submit' command as a step (which is practically identical to running this directly on the master node). In this scenario, the deploy mode of the Spark job has a direct impact on how the job will run since we have an awareness of the underlying architecture of the cluster as well as the daemons running on each specific node. Using 'client-mode' will cause the Spark driver container to run on the master node, as such resources on this node (Memory and CPU) will be used by this container. If enough jobs are running in client mode, this can cause the Master Node to move resources away from crucial daemons such as the YARN ResourceManager and HDFS DataNode daemons to run the Spark driver containers which in turn can cause cluster issues (eg: YARN ResourceManager restart). For this reason it is considered a better practice to run these jobs in cluster mode (in most cases).

Comparing this with EMR Serverless, there is no 'master' node. Everything is abstracted away and from the end user point of view each node can be considered to be identical. As such, where the driver runs makes no difference because we don't need to take into account other daemons like we did in EMR on EC2 and the job will run the same with either configuration.

Please let me know if this explanation helps or if you have any further questions.

profile pictureAWS
지원 엔지니어
Tim-AWS
답변함 8달 전
profile picture
전문가
검토됨 7달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인