Difference in vCore and vCPU | EC2 and EMR

0

I am new to AWS and EMR things. I actually wanted to know the difference between vCPU and vCore.
In the documentation, https://aws.amazon.com/ec2/instance-types/ it shows that m4.large has 2 vCPU and 8GiB and I verified this by SSHing to the master node that there were 2 CPUs at work.
Now, when I spin off the EMR cluster with m4.large as an instance type, then in the Hardware section of cluster shows the description of Instance type as 4 vCore, 8GiB; EBS storage 32 GiB.

I am confused between the concepts. I will appreciate if someone can clarify with my doubts.

rawlani
asked 5 years ago2452 views
2 Answers
0

This has been an issue for some time now but there was never a clear reason given by AWS that I know of.

[1] https://forums.aws.amazon.com/message.jspa?messageID=833300
[2] https://forums.aws.amazon.com/thread.jspa?threadID=266092

From my observations, this has been "corrected" with the m5 generation. However, this makes it difficult to configure the spark executor resources for a cluster with a fleet using multiple generations of instance types, which is what is recommended by AWS: https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-spark-applications-using-amazon-ec2-spot-instances-with-amazon-emr/

My only guess is that the virtual cores / hardware threads are being taken into account here incorrectly when presented to YARN and since AWS has already made this generally available they can't roll it back because customers have already had to work around it.

answered 5 years ago
0

When you choose an instance type using the AWS EMR Management Console, the number of vCPU shown for each Instance type is the number of YARN vCores for that instance type, not the number of EC2 vCPUs for that instance type.
http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-instance-purchasing-options.html

  1. EMR console will be picking the yarn.nodemanager.resource.cpu-vcores value for the respective instance type which is set from a predefined fixed mapping done by EMR service for every instance type / Family. For some instance types/families like M5's, EMR made vCore same as Ec2 vCPU's. And for some other instance types(Like M4 family), the setting is double the actual Ec2 vCPU's.
    For example : EMR used 80 vcore’s for m4.10xlarge whereas Ec2 reports vCPU's as 40.

  2. So it seems that the intent here is to report VCore usage at the YARN level, as opposed to the actual ec2 instance level.

  3. The discrepancy on the EMR Console exists because we're trying to represent a cluster's compute power from YARN perspective. Since EMR clusters run applications according to their YARN settings, some decision may have been made to deem this a better representation of the compute resources than ec2's vCPU.

  4. The reason this is done is to ensure that YARN runs enough containers to max out the CPU as much as possible. EMR determined at the introduction of some instance type families, that for a majority of use-cases, without doubling this value, the instances CPUs will usually be underutilized because most of the time applications are I/O bound. That is, if vCPUs were set to the actual number of CPUs for these instance types, you'd get about one YARN container per actual vCPU, but those containers would spend most of their time blocked on I/O anyway, so you could probably actually run more containers in order to max out the CPU.

  5. Amazon EMR makes an effort to provide optimal configuration settings as defaults for each instance type for the broadest range of use cases(types of application and Engines like MapReduce and Spark). However it is possible that you may need to manually adjust these settings for the needs of your application. This value may be changed via the Configuration API referencing the yarn.nodemanager.resource.cpu-vcores for your different applications and workloads using "yarn-site" classification.

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html
https://hadoop.apache.org/docs/r2.8.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions