Datasync HDFS to S3 on Azure VM

0

I have my HDFS setup on a Azure VM and datasync agent setup on a different Azure VM and enabled communication between two VMS. My ultimate goal is to transfer data from HDFS to AWS s3. I have configured and activated the datasync agent and have connected the datasync agent using AWS public end points . I have tested network connectivity to public end points and self managed storage i.e. HDFS here. The connectivity showed PASSED for both. But when I create a task using the activated agent and keeping the source as HDFS and S3, it is just throwing me error as " input/output error cannot read source file ". Can you please let me know how I can fix this issue.

질문됨 2년 전345회 조회
1개 답변
0

Input/Output errors almost always indicate a network connection problem between the DataSync agent and HDFS NameNode or HDFS DataNodes. You can check this here:

You need to validate the namenode metadata service port (IPC port) set on the core-site.xml under the following property: fs.defaultFS or fs.default.name, depending upon your Hadoop distribution.

$ grep -A 2 -B 1 -i fs.default core-site.xml 
  <property>    
    <name>fs.defaultFS</name>
    <value>hdfs://ip-xx-xx-xx-xx.ec2.internal:8020</value>
  </property>

You must also validate the datanode port setting under dfs.datanode.address property.

$ grep -A 2 -B 1 -i dfs.datanode.address hdfs-site.xml
  <property>
    <name>dfs.datanode.address</name>
    <value>xxx.xxx.xxx.xxx:1004</value>
  </property>

Are you using Simple Authentication or Kerberos Authentication?

AWS
Jeff_B
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인