This is a performance optimization feature of emr-5.32.0, where Spark/YARN on EMR will now consolidate container requests into a fewer number of larger containers. Executor memory/cores will be a multiple of spark.executor.memory/cores. Generally, using a smaller number of larger executors will be more performant than a larger number of smaller executors, so this is now the behavior that is performed by default.
If for some reason you need to disable this behavior, you may do so by setting spark.yarn.heterogeneousExecutors.enabled=false. Alternatively, you may set spark.executor.maxMemory/maxCores to values lower than Int.MaxValue, if you want to cap the memory/cores that will be used for each executor. (That is, you can set specific maxMemory/maxCores values without fully disabling the feature.) Note that these are EMR-specific properties and will not be found in Apache Spark documentation.
@Jonathan@AWS This... seems questionable? Certainly as a default configuration with only an undocumented configuration to turn it off? If I specify that I want c cores per executor and e executors and say explicitly "no, I don't want any dynamic allocation", the system should listen and do c cores and e executors! Presumably if I'm specifying all of those configurations, I have worked enough with this particular job flow to know exactly how I want it to behave.
Obviously the calculation is different if dynamic allocation is enabled. Also of course Spark (with 3.x) and new tweaks on the AWS side tuned configurations can go stale. So if you want to, for example, warn on stderr that "hey, we think this might work better with 2c cores/executor and e/2 executors, consider it!" I'd understand. But just overriding the explicit tuning many of us have done on our production Spark jobs over the years without warning isn't good.
Scalabiliy of the shared disk size on a clusterasked 2 years ago
What is the cluster manager in SageMaker Spark Processing?Accepted AnswerEXPERTasked 2 years ago
AWS EMR (HDFS + Spark) - AWS EMR (Spark)Accepted Answerasked 9 months ago
Multiple Spark Submits in Parallelasked 7 months ago
Unable to Create Spark Cluster of Specific Size on 5.32.0asked 2 years ago
How do I access the Spark UI?asked 3 years ago
How to update cluster config when the original ebs snapshot is goneAccepted Answerasked a year ago
EMR workspace still show "workspace not attached to cluster" error after the attach status already shows the attached clusterasked 4 months ago
How can I see the remaining time when resizing Redshift cluster?asked 5 months ago
Reading Aurora Postgress Table with Spark SQL on EMRAccepted Answerasked 4 years ago