Datasync HDFS to S3 on Azure VM

0

I have my HDFS setup on a Azure VM and datasync agent setup on a different Azure VM and enabled communication between two VMS. My ultimate goal is to transfer data from HDFS to AWS s3. I have configured and activated the datasync agent and have connected the datasync agent using AWS public end points . I have tested network connectivity to public end points and self managed storage i.e. HDFS here. The connectivity showed PASSED for both. But when I create a task using the activated agent and keeping the source as HDFS and S3, it is just throwing me error as " input/output error cannot read source file ". Can you please let me know how I can fix this issue.

asked 2 years ago334 views
1 Answer
0

Input/Output errors almost always indicate a network connection problem between the DataSync agent and HDFS NameNode or HDFS DataNodes. You can check this here:

You need to validate the namenode metadata service port (IPC port) set on the core-site.xml under the following property: fs.defaultFS or fs.default.name, depending upon your Hadoop distribution.

$ grep -A 2 -B 1 -i fs.default core-site.xml 
  <property>    
    <name>fs.defaultFS</name>
    <value>hdfs://ip-xx-xx-xx-xx.ec2.internal:8020</value>
  </property>

You must also validate the datanode port setting under dfs.datanode.address property.

$ grep -A 2 -B 1 -i dfs.datanode.address hdfs-site.xml
  <property>
    <name>dfs.datanode.address</name>
    <value>xxx.xxx.xxx.xxx:1004</value>
  </property>

Are you using Simple Authentication or Kerberos Authentication?

AWS
Jeff_B
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions