Calculate number of executors ,executor memory and task nodes for spark job on AWS EMR

2

Hi, Lets say we have 1 TB file in s3 to process it using aws emr . As per documentation it says we need to select hardware 3 times the size of input file . So for processing 1 TB file we need ec2 machines capable of storing 3 TB of data due to HDFS replication factor of 3. lets say we took 12 - m5.4xlarge (64 GiB memory,256 GiB storage,16 vcores). Then we end up accomodating 3 TB data (256 GiB * 12) = 3072 GB . Now with 12 - m5.4xlarge we have total memory pool of (64 GiB * 12) = 768 GiB memory and (12*16 vcores) = 192 vcores. Now how will we further decide num-executors, executor-max-cores and how can we further leverage Task nodes for processing . Help me in understanding further calculation please . Shubham Satyam(contact no 91-7978864339)

Shubham
asked 8 months ago1429 views
2 Answers
3

how will we further decide num-executors, executor-max-cores

64 GiB memory. (64 GiB * 12) = 768 GiB memory 16 vcores. (12 *16 vcores) = 192 vcores.

Let’s assign 5 core per executors => --executor-cores = 5 (for good HDFS throughput and general recommendation(subject to vary depends on specific compute intensive workload and tune trail & error basis if need to change)

Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15

So, Total available of cores in cluster = 15 x 12 = 180

Number of available executors = (total cores/num-cores-per-executor) = 180/5 = 36 Considering 1 executor for ApplicationManager => --num-executors = 35

Number of executors per node = 30/10 = 3 (To verify individual node's No.of executors)

To calculate executor memory, Memory per executor = Memory per node/(Total executors/Total nodes) Memory per executor = 64GB/(36/12) = 21GB

Counting off heap overhead = 7% of 21GB = 1.5GB. So, round off --executor-memory = 21 - 2 = 19GB

how can we further leverage Task nodes for processing

If you not enabled auto scaling and always keep the same number of nodes, you can consider storing data only in core node and use task nodes only for computations and make sure enough disk space assigned for storing the shuffle data. On the other hand, If you use autoscaling, then I recommend to keep the core node static and scale only task nodes for better performance and throughput as scaling core nodes experience additional overhead of rebalancing HDFS data and replication factor concerns. More details here

AWS
SUPPORT ENGINEER
answered 8 months ago
  • Hi Yokesh,

    First of all thanks for your detailed explanation for EMR cluster capacity estimation.One small question here . How would 1 Tb file be broken down into partitions and how do we decide optimum size of each partiton .

    Thanks

3

Hello Shubham,

How would 1 Tb file be broken down into partitions and how do we decide optimum size of each partition .

There are 3 ways of defining the partitions. 1. Input, 2. Shuffle , 3. Output. Input & output partitions are controllable easily. 1Tb of total size of dataset would be broken into partitions depends on either spark.default.parallelism(default value is 128) or spark.sql.files.maxPartitionBytes where you can define the individual file size. Based on that, it will parallelize the tasks.

Shuffle partition requires bit of experiments to define the right number which plays the important key role of shuffle exchange. (spark.sql.shuffle.partitions) - default 200 but configure the number of partitions to use based shuffling data for joins or aggregations and try to set the number that makes the partition file size around 100 to 200mb(talking about general case).

AWS
SUPPORT ENGINEER
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions