EMR - Number of core node that a primary node will be able to orchestrate


We are aware that AWS recommendation is to use 3 primary nodes, but for some reason in our lower environments we want to go with 1 primary node

We are planning to use only one primary node for our AWS EMR cluster which run Flink Jobs. We are going to have initially 3 core nodes and go for Managed scalling for the autoscalling option. The clarification that we have is,

  1. How many core node that a single primary node will be able to orchestrate? is there any limitations on number of core nodes when we have only 1 primary node? or it is unlimited (1 primary node can orchestrate N core nodes)
  2. If we have only one primary node, what is the availability commitment from AWS? (Considering termination protection is enabled)
  3. If we have one single primary node, when the primary node goes down, will AWS automatically bring it up or a human intervention is required to bring it up?
asked 9 months ago617 views
1 Answer
Accepted Answer

How many core nodes can a single primary node orchestrate?

The primary node in an EMR cluster is responsible for orchestrating the tasks and distributing them among core and task nodes. It doesn't directly limit the number of core nodes you can have. However, there might be practical limits based on network overhead, the capacity of the primary node to manage tasks, and other operational factors. AWS doesn't provide a specific maximum for the number of nodes, but clusters with thousands of nodes are not uncommon. Note that having a single primary node can become a bottleneck for extremely large clusters, but for most use cases, it's sufficient.

Availability commitment from AWS with a single primary node:

EMR does not provide a Service Level Agreement (SLA) for individual clusters. The primary node is a single point of failure in an EMR cluster. If it fails, the entire cluster becomes non-functional. Using 3 primary nodes increases the fault tolerance of the cluster as it can handle the failure of a primary node without losing the cluster. If you decide to go with a single primary node in your environment, you are sacrificing some fault tolerance.

Behavior when the primary node goes down:

If the primary node fails, the EMR cluster will become non-functional. EMR will not automatically recover a failed primary node. While you have termination protection enabled, it only prevents accidental termination through the AWS Management Console, the AWS SDKs, or the CLI. It doesn't provide protection against node failures. If a primary node goes down, you would typically need to terminate the cluster and start a new one. Your data stored in HDFS will be lost, but if you're using EMRFS with data in S3, that data remains intact.

To summarize, while you can use a single primary node for your EMR cluster, be aware of the potential risks associated with it. If fault tolerance is crucial for your lower environments, consider using multiple primary nodes.

profile picture
answered 9 months ago
profile picture
reviewed 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions