Hadoop across multiple regions

0

I want to set up a hadoop cluster across multiple regions. However, as one may have already noticed, the instance can not bind to the elastic ip. As a result, the communications between the namenode in zone1 and the datanode in zone2 can not be initiated by start-dfs.sh, reporting a socket binding error.

In my deployment, I set the "fs.defaultFS" in core-site.xml to "hdfs://ec2xxxx.amazonaws.com(public DNS):8020" and give access to all inbound/outbound traffics, but still fail to run the cluster. Could anyone please give me some advice? Thanks a lot!

sym0990
asked 5 years ago382 views
1 Answer
0

Hi, Hadoop across multiple regions is NOT supported by AWS. In fact, AWS recommends all resources in the cluster should be in the same Availability Zone to optimize performance.
Link: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-region.html

Also, here are some additional statements from Hortonworks and Cloudera on Hadoop cluster across multiple regions..
Link: https://community.hortonworks.com/questions/224369/do-you-support-spanning-a-hadoop-cluster-over-mult.html
Hortonworks -While it is possible to span a cluster across multiple Availability Zones, we don’t generally recommend spanning availability zones for the following reasons:

  1. In AWS, there is a natural latency involved with data moving across multiple AZs which will lead to performance problems or other issues in the cluster. Specially as the cluster size and workload increases the performance issues are more pronounced.

  2. Double billing: As a natural part of cluster functioning, there will be data transfer. According to AWS FAQs - Each instance is charged for its data in and data out at corresponding Data Transfer rates. Therefore, if data is transferred between these two instances, it is charged at "Data Transfer Out from EC2 to Another AWS Region" for the first instance and at "Data Transfer In from Another AWS Region" for the second instance

Link: https://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_aws.pdf
Cloudera EDH deployments are restricted to single regions. Single clusters spanning regions are not supported.

Hope this helps!
-randy

answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions