HBase on EMR - multi-AZ availability

0

A customer is asking how to achieve best uptime with HBase on EMR. He is looking for a managed solution, however they are not happy with the fact that the master nodes are deployed in a single AZ (data can be S3 for example, so it will survive AZ outage). I checked the docs [1] and it seems that even with multiple master nodes, they are deployed into a single subnet, hence a single AZ. They are looking for a solution that will automatically failover to another AZ.

Do we have any offering on that front?

[1] https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-ha-launch.html

AWS
asked 3 years ago913 views
1 Answer
0
Accepted Answer

Amazon EMR launches all nodes for a given cluster in the same Amazon EC2 Availability Zone. Running a cluster in the same zone improves performance of the jobs flows because it provides a higher data access rate. By default, Amazon EMR chooses the Availability Zone with the most available resources in which to run your cluster. However, you can specify another Availability Zone if required.

For Configure instance fleets : Select the VPC and one or more subnets where you would like to deploy your Amazon EMR Cluster. We recommend choosing more than one Availability Zone. Your cluster will still be deployed in a single Availability Zone, however selecting multiple Availability Zones allows Amazon EMR to look across all selected Availability Zones to deploy your cluster in the Availability Zone with the most EC2 Spot Capacity to run your cluster.

'To answer your question on ‘uptime with HBase on EMR’:

Apache HBase on Amazon S3 can be recommended if your application does not require support for high availability of writes and can tolerate failures during writes/updates. If you would like to mitigate the impact of Amazon EMR Master node failures (or Availability Zone failures that can cause the termination of the Apache HBase on Amazon S3 cluster or any temporary degradation of service due to an Apache HBase RegionServer operational/transient issues), we recommend that your pipeline architecture relies on a stream/messaging platform upstream to the Apache HBase on Amazon S3 cluster. We recommend that you always use the latest Amazon EMR release so you can benefit from all changes and features continuously added to Apache HBase.

https://d1.awsstatic.com/whitepapers/Migrating_to_Apache_Hbase_on_Amazon_S3_on_Amazon_EMR.pdf

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions