Hadoop across multiple regions

0

I want to set up a hadoop cluster across multiple regions. However, as one may have already noticed, the instance can not bind to the elastic ip. As a result, the communications between the namenode in zone1 and the datanode in zone2 can not be initiated by start-dfs.sh, reporting a socket binding error.

In my deployment, I set the "fs.defaultFS" in core-site.xml to "hdfs://ec2xxxx.amazonaws.com(public DNS):8020" and give access to all inbound/outbound traffics, but still fail to run the cluster. Could anyone please give me some advice? Thanks a lot!

sym0990
demandé il y a 5 ans403 vues
1 réponse
0

Hi, Hadoop across multiple regions is NOT supported by AWS. In fact, AWS recommends all resources in the cluster should be in the same Availability Zone to optimize performance.
Link: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-region.html

Also, here are some additional statements from Hortonworks and Cloudera on Hadoop cluster across multiple regions..
Link: https://community.hortonworks.com/questions/224369/do-you-support-spanning-a-hadoop-cluster-over-mult.html
Hortonworks -While it is possible to span a cluster across multiple Availability Zones, we don’t generally recommend spanning availability zones for the following reasons:

  1. In AWS, there is a natural latency involved with data moving across multiple AZs which will lead to performance problems or other issues in the cluster. Specially as the cluster size and workload increases the performance issues are more pronounced.

  2. Double billing: As a natural part of cluster functioning, there will be data transfer. According to AWS FAQs - Each instance is charged for its data in and data out at corresponding Data Transfer rates. Therefore, if data is transferred between these two instances, it is charged at "Data Transfer Out from EC2 to Another AWS Region" for the first instance and at "Data Transfer In from Another AWS Region" for the second instance

Link: https://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_aws.pdf
Cloudera EDH deployments are restricted to single regions. Single clusters spanning regions are not supported.

Hope this helps!
-randy

répondu il y a 5 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions