Skip to content

Is Mixing Graviton and non-Graviton Instances Supported in the Same EMR Cluster?

0

I would like to confirm whether it is possible to configure an Amazon EMR cluster with mixed instance types, combining both Graviton-based and non-Graviton instances within the same cluster. I'm going to run spark jobs on it.

For example, a configuration like the following:

Primary Node: m6g.4xlarge (Graviton)

Core Nodes: r6g.2xlarge (Graviton)

Task Nodes (Spot Instances): r6a.xlarge (non-Graviton)

I have reviewed the official Amazon EMR documentation, but I could not find any specific mention or guideline about mixing Graviton and non-Graviton instances within the same cluster.

asked a year ago395 views
2 Answers
0

Yes, it is possible to configure an Amazon EMR cluster with mixed instance types, combining both Graviton-based and non-Graviton instances within the same cluster. This configuration is supported when using multiple custom AMIs for different EC2 instance types in the cluster.

To set up a cluster with mixed Graviton and non-Graviton instances, you need to use the multiple custom AMIs feature. This allows you to specify different custom AMIs for each instance type in your cluster. By doing so, you can have Graviton-based instances (like m6g.4xlarge and r6g.2xlarge) and non-Graviton instances (like r6a.xlarge) in the same cluster.

When creating such a cluster, you'll need to specify the appropriate custom AMI for each instance type. For Graviton instances, you'll use an ARM64 architecture AMI, while for non-Graviton instances, you'll use an AMD64 architecture AMI.

It's important to note that when running Spark jobs on a cluster with mixed architectures, Amazon EMR supports multi-architecture container images. This means that the correct architecture image (ARM64 or AMD64) will be automatically selected based on the instance type when running your Spark jobs.

Keep in mind that while this configuration is possible, you cannot use a single custom AMI for all instances when mixing Graviton and non-Graviton types. You must use the multiple custom AMIs approach to achieve this mixed architecture setup.

When setting up your cluster with this mixed configuration, make sure to carefully consider your workload requirements and the performance characteristics of each instance type to optimize your cluster's performance and cost-efficiency.
Sources
Work with multi-architecture images - Amazon EMR
Using a custom AMI to provide more flexibility for Amazon EMR cluster configuration - Amazon EMR

answered a year ago
EXPERT
reviewed a year ago
  • But due tho difference in architechtures, wouldnt it happen a conflict between those instances while assigning and running spark jobs? Do they run with same dependencies (kafka, iceberg, spark, hadoop)

  • While it is technically possible to mix Graviton and non-Graviton instances in the same EMR cluster, it introduces complexity due to differences in architecture. To avoid conflicts:

    • Ensure all dependencies (Kafka, Iceberg, Spark, Hadoop) are compatible with both ARM64 and x86_64.
    • Test your Spark jobs thoroughly on both architectures.
    • Monitor performance and task distribution to identify and resolve any issues.

    If compatibility or performance issues arise, consider using a homogeneous cluster with either all Graviton or all non-Graviton instances.

0

Yes, Amazon EMR technically supports mixing Graviton-based and non-Graviton instance types within the same EMR cluster, especially for Task nodes, but there are some important caveats and considerations to keep in mind.

Understanding EMR Node Roles and Mixing Types:

  • Primary (Master) Node: Runs cluster management services and the Spark driver (if running client mode).
  • Core Nodes: Store HDFS data and run executors.
  • Task Nodes: Purely run Spark executors and can be scaled dynamically (and optionally be Spot).

You can mix instance families only between the Task node group and the Core/Master, but mixing instance architectures (x86_64 vs. arm64) is more nuanced.

Key Considerations When Mixing Graviton and x86 Instances

  1. CPU Architecture Compatibility
    • Graviton = Arm64
    • x86-based = Intel/AMD

Spark executors (running on both architectures) must use the same build of Spark and dependencies that support multi-architecture containers or runtime environments.

  1. Amazon EMR AMIs and Runtime Compatibility

    • EMR releases starting with EMR 6.7+ offer better support for Graviton2 instances.
    • EMR runtime for Spark can run across both architectures if all required libraries, jars, and dependencies are architecture-independent or properly compiled.
  2. Java/Scala/PySpark Compatibility

    • PySpark and Java apps generally work well on both, as long as you don't use native x86 libraries.
    • Avoid using libraries that depend on native code compiled for a specific architecture unless you provide both versions.
  3. Containerization (EMR on EKS or EMR Serverless)

    • If you're using EMR on EKS with containers, you must provide a multi-arch container image (arm64 and amd64), or separate workloads per architecture.

Your Configuration Example

Primary Node: m6g.4xlarge (Graviton)
Core Nodes: r6g.2xlarge (Graviton)
Task Nodes (Spot): r6a.xlarge (non-Graviton)

This is technically valid, if:

  • You're using a Spark runtime and dependencies that are compatible across both arm64 and amd64.
  • You're not relying on native libraries compiled specifically for one architecture.
  • You test your application end-to-end to confirm there are no runtime issues (e.g., serialization, library incompatibilities).

However, mixing Graviton (ARM) and non-Graviton (x86) is not the recommended or documented best practice by AWS for EMR on EC2. Most production workloads stick to one architecture for consistency and supportability.

Best Practices

  • Prefer homogeneous architectures unless you explicitly need to use Spot pools that only offer x86.
  • If using mixed architectures, ensure your Spark jobs and libraries are architecture-neutral.
  • Consider EMR on EKS with multi-arch containers if you need to support mixed architectures in a controlled manner.

Recommendation If cost savings via Spot are the goal, consider:

  • Using Graviton Spot instances (m6g, r6g, etc.) in Task nodes.
  • Keeping all nodes on Graviton to avoid potential architecture conflicts unless you've rigorously tested compatibility.
AWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.