Datasync HDFS to S3 on Azure VM

0

I have my HDFS setup on a Azure VM and datasync agent setup on a different Azure VM and enabled communication between two VMS. My ultimate goal is to transfer data from HDFS to AWS s3. I have configured and activated the datasync agent and have connected the datasync agent using AWS public end points . I have tested network connectivity to public end points and self managed storage i.e. HDFS here. The connectivity showed PASSED for both. But when I create a task using the activated agent and keeping the source as HDFS and S3, it is just throwing me error as " input/output error cannot read source file ". Can you please let me know how I can fix this issue.

gefragt vor 2 Jahren344 Aufrufe
1 Antwort
0

Input/Output errors almost always indicate a network connection problem between the DataSync agent and HDFS NameNode or HDFS DataNodes. You can check this here:

You need to validate the namenode metadata service port (IPC port) set on the core-site.xml under the following property: fs.defaultFS or fs.default.name, depending upon your Hadoop distribution.

$ grep -A 2 -B 1 -i fs.default core-site.xml 
  <property>    
    <name>fs.defaultFS</name>
    <value>hdfs://ip-xx-xx-xx-xx.ec2.internal:8020</value>
  </property>

You must also validate the datanode port setting under dfs.datanode.address property.

$ grep -A 2 -B 1 -i dfs.datanode.address hdfs-site.xml
  <property>
    <name>dfs.datanode.address</name>
    <value>xxx.xxx.xxx.xxx:1004</value>
  </property>

Are you using Simple Authentication or Kerberos Authentication?

AWS
Jeff_B
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen