Unable to Create Spark Cluster of Specific Size on 5.32.0


When I start spark-shell with the following configuration, the Spark cluster has just 2 bigger executors than it was specified (see the attached screen shot):

spark-shell --master yarn --driver-memory 4G --driver-cores 2 --executor-cores 1 --executor-memory 2G --num-executors 8 --conf spark.dynamicAllocation.enabled=false --conf spark.scheduler.minRegisteredResourcesRatio=1

EMR 5.32.0
master node: m3.xlarge
worker nodes: 4x m3.xlarge
default configuration

The command works as expected on EMR 6.2.0, EMR 5.31.0, EMR 5.30.1 with the same spec, so I'm wondering whether I could consider this problem as a bug?

asked 3 years ago638 views
2 Answers

This is a performance optimization feature of emr-5.32.0, where Spark/YARN on EMR will now consolidate container requests into a fewer number of larger containers. Executor memory/cores will be a multiple of spark.executor.memory/cores. Generally, using a smaller number of larger executors will be more performant than a larger number of smaller executors, so this is now the behavior that is performed by default.

If for some reason you need to disable this behavior, you may do so by setting spark.yarn.heterogeneousExecutors.enabled=false. Alternatively, you may set spark.executor.maxMemory/maxCores to values lower than Int.MaxValue, if you want to cap the memory/cores that will be used for each executor. (That is, you can set specific maxMemory/maxCores values without fully disabling the feature.) Note that these are EMR-specific properties and will not be found in Apache Spark documentation.

answered 3 years ago

@Jonathan@AWS This... seems questionable? Certainly as a default configuration with only an undocumented configuration to turn it off? If I specify that I want c cores per executor and e executors and say explicitly "no, I don't want any dynamic allocation", the system should listen and do c cores and e executors! Presumably if I'm specifying all of those configurations, I have worked enough with this particular job flow to know exactly how I want it to behave.

Obviously the calculation is different if dynamic allocation is enabled. Also of course Spark (with 3.x) and new tweaks on the AWS side tuned configurations can go stale. So if you want to, for example, warn on stderr that "hey, we think this might work better with 2c cores/executor and e/2 executors, consider it!" I'd understand. But just overriding the explicit tuning many of us have done on our production Spark jobs over the years without warning isn't good.

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions