EMR Serverless change deploy mode

0

I'm new to EMR serverless. i found out that EMR Serverless uses client mode as default deploy mode. and there's no informations to use cluster mode in emr serverless.

is there any way to use 'cluster mode' in EMR Serverless? if not, why can't use cluster mode in EMR Serverless?

  • Is there a reason you want to change the deploy mode? (As I understand it), tis is useful when you want to run your driver on the worker nodes in YARN mode, but EMR Serverless has its own resource manager so not sure if this is relevant.

asked 8 months ago297 views
1 Answer
1
Accepted Answer

Hello

To answer your question: per my understanding there is no way to change the deploy mode from 'cluster' in EMR Serverless. The reason is that there isn't really a difference between the two modes on EMR Serverless. In a regular EMR cluster, there is a master node from the end user's point of view, but on EMR Serverless there is no real 'master node'. To explain what I mean, let me put it this way:

Consider the example of running a Spark job on EMR on EC2 cluster by running the 'spark-submit' command as a step (which is practically identical to running this directly on the master node). In this scenario, the deploy mode of the Spark job has a direct impact on how the job will run since we have an awareness of the underlying architecture of the cluster as well as the daemons running on each specific node. Using 'client-mode' will cause the Spark driver container to run on the master node, as such resources on this node (Memory and CPU) will be used by this container. If enough jobs are running in client mode, this can cause the Master Node to move resources away from crucial daemons such as the YARN ResourceManager and HDFS DataNode daemons to run the Spark driver containers which in turn can cause cluster issues (eg: YARN ResourceManager restart). For this reason it is considered a better practice to run these jobs in cluster mode (in most cases).

Comparing this with EMR Serverless, there is no 'master' node. Everything is abstracted away and from the end user point of view each node can be considered to be identical. As such, where the driver runs makes no difference because we don't need to take into account other daemons like we did in EMR on EC2 and the job will run the same with either configuration.

Please let me know if this explanation helps or if you have any further questions.

profile pictureAWS
SUPPORT ENGINEER
Tim-AWS
answered 8 months ago
profile picture
EXPERT
reviewed 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions