Hadoop across multiple regions

0

I want to set up a hadoop cluster across multiple regions. However, as one may have already noticed, the instance can not bind to the elastic ip. As a result, the communications between the namenode in zone1 and the datanode in zone2 can not be initiated by start-dfs.sh, reporting a socket binding error.

In my deployment, I set the "fs.defaultFS" in core-site.xml to "hdfs://ec2xxxx.amazonaws.com(public DNS):8020" and give access to all inbound/outbound traffics, but still fail to run the cluster. Could anyone please give me some advice? Thanks a lot!

sym0990
已提問 5 年前檢視次數 403 次
1 個回答
0

Hi, Hadoop across multiple regions is NOT supported by AWS. In fact, AWS recommends all resources in the cluster should be in the same Availability Zone to optimize performance.
Link: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-region.html

Also, here are some additional statements from Hortonworks and Cloudera on Hadoop cluster across multiple regions..
Link: https://community.hortonworks.com/questions/224369/do-you-support-spanning-a-hadoop-cluster-over-mult.html
Hortonworks -While it is possible to span a cluster across multiple Availability Zones, we don’t generally recommend spanning availability zones for the following reasons:

  1. In AWS, there is a natural latency involved with data moving across multiple AZs which will lead to performance problems or other issues in the cluster. Specially as the cluster size and workload increases the performance issues are more pronounced.

  2. Double billing: As a natural part of cluster functioning, there will be data transfer. According to AWS FAQs - Each instance is charged for its data in and data out at corresponding Data Transfer rates. Therefore, if data is transferred between these two instances, it is charged at "Data Transfer Out from EC2 to Another AWS Region" for the first instance and at "Data Transfer In from Another AWS Region" for the second instance

Link: https://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_aws.pdf
Cloudera EDH deployments are restricted to single regions. Single clusters spanning regions are not supported.

Hope this helps!
-randy

已回答 5 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南