Hadoop across multiple regions

0

I want to set up a hadoop cluster across multiple regions. However, as one may have already noticed, the instance can not bind to the elastic ip. As a result, the communications between the namenode in zone1 and the datanode in zone2 can not be initiated by start-dfs.sh, reporting a socket binding error.

In my deployment, I set the "fs.defaultFS" in core-site.xml to "hdfs://ec2xxxx.amazonaws.com(public DNS):8020" and give access to all inbound/outbound traffics, but still fail to run the cluster. Could anyone please give me some advice? Thanks a lot!

sym0990
已提问 5 年前403 查看次数
1 回答
0

Hi, Hadoop across multiple regions is NOT supported by AWS. In fact, AWS recommends all resources in the cluster should be in the same Availability Zone to optimize performance.
Link: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-region.html

Also, here are some additional statements from Hortonworks and Cloudera on Hadoop cluster across multiple regions..
Link: https://community.hortonworks.com/questions/224369/do-you-support-spanning-a-hadoop-cluster-over-mult.html
Hortonworks -While it is possible to span a cluster across multiple Availability Zones, we don’t generally recommend spanning availability zones for the following reasons:

  1. In AWS, there is a natural latency involved with data moving across multiple AZs which will lead to performance problems or other issues in the cluster. Specially as the cluster size and workload increases the performance issues are more pronounced.

  2. Double billing: As a natural part of cluster functioning, there will be data transfer. According to AWS FAQs - Each instance is charged for its data in and data out at corresponding Data Transfer rates. Therefore, if data is transferred between these two instances, it is charged at "Data Transfer Out from EC2 to Another AWS Region" for the first instance and at "Data Transfer In from Another AWS Region" for the second instance

Link: https://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_aws.pdf
Cloudera EDH deployments are restricted to single regions. Single clusters spanning regions are not supported.

Hope this helps!
-randy

已回答 5 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则