Neptune Loader throws LOAD_FAILED error

0

Hello,

I have been trying to add .csv files containing node and edge data to Neptune from S3 by following the steps from https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-data.html . When I do this, I get the error (I ran the curl command in the aws cloudshell):

curl: (28) Failed to connect to database-1-instance-1.cx0ll0stx64q.ap-south-1.neptune.amazonaws.com port 8182 after 130342 ms: Couldn't connect to server

after a couple of minutes.

When I went and made a SageMaker notebook and used the%load command to try and load the data, it gave the error LOAD_FAILED. Upon running the command %load_status <load_id> --details --errors(in place of <load_id> I entered the actual load ID), I get the following result:

{
  "status": "200 OK",
  "payload": {
    "feedCount": [
      {
        "LOAD_FAILED": 2
      }
    ],
    "overallStatus": {
      "fullUri": "s3://neptutest",
      "runNumber": 1,
      "retryNumber": 4,
      "status": "LOAD_FAILED",
      "totalTimeSpent": 6,
      "startTime": 1677234331,
      "totalRecords": 0,
      "totalDuplicates": 0,
      "parsingErrors": 0,
      "datatypeMismatchErrors": 0,
      "insertErrors": 0
    },
    "failedFeeds": [
      {
        "fullUri": "s3://neptutest/edges.csv",
        "runNumber": 1,
        "retryNumber": 4,
        "status": "LOAD_FAILED",
        "totalTimeSpent": 3,
        "startTime": 1677234334,
        "totalRecords": 0,
        "totalDuplicates": 0,
        "parsingErrors": 0,
        "datatypeMismatchErrors": 0,
        "insertErrors": 0
      },
      {
        "fullUri": "s3://neptutest/nodes.csv",
        "runNumber": 1,
        "retryNumber": 4,
        "status": "LOAD_FAILED",
        "totalTimeSpent": 0,
        "startTime": 1677234337,
        "totalRecords": 0,
        "totalDuplicates": 0,
        "parsingErrors": 0,
        "datatypeMismatchErrors": 0,
        "insertErrors": 0
      }
    ],
    "errors": {
      "startIndex": 0,
      "endIndex": 0,
      "loadId": "<load_id>",
      "errorLogs": []
    }
  }
}

What is the fix for this error?

Thanks a lot!

  • Does your cluster have a NeptuneLoadFrom S3 role applied to it as described here? https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-tutorial-IAM.html#bulk-load-tutorial-IAM-CreateRole

    Also, if you try to do a %status from the notebook, does it work? This is just to prove that you can connect to Neptune.

  • Yes. I have the role applied to the Neptune cluster. I am also posting the trust relationship details I put into the role

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "allows",
                "Effect": "Allow",
                "Principal": {
                    "Service": [
                        "rds.amazonaws.com",
                        "s3.amazonaws.com"
                    ]
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }
    

    Note that I have tried by also putting only rds.amazonaws.com in the Services as well. I have also tried removing the Sid statement and tried it. The result is exactly the same as previously.

    Yes %status works as I have made the necessary changes to the security group. Here is the output:

    {'status': 'healthy',
     'startTime': 'Mon Feb 13 14:29:15 UTC 2023',
     'dbEngineVersion': '1.2.0.2.R2',
     'role': 'writer',
     'dfeQueryEngine': 'viaQueryHint',
     'gremlin': {'version': 'tinkerpop-3.5.2'},
     'sparql': {'version': 'sparql-1.1'},
     'opencypher': {'version': 'Neptune-9.0.20190305-1.0'},
     'labMode': {'ObjectIndex': 'disabled',
      'ReadWriteConflictDetection': 'enabled'},
     'features': {'ResultCache': {'status': 'disabled'},
      'IAMAuthentication': 'disabled',
      'Streams': 'disabled',
      'AuditLog': 'disabled'},
     'settings': {'clusterQueryTimeoutInMs': '120000'}}
    
  • There seem to be two different issues here. If %load from a notebook is working but the loader is giving an error that's more likely to be the data itself. Are you able to share the header row of the CSV file and one row of data? If not, are you able to open a support case (assuming you have not already done so) ? For the curl not working, did wherever you ran the curl from have access to the Neptune VPC?

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions