How do I resolve processing errors when I use Amazon Neptune Bulk Loader?

3 minute read
0

I want to use Amazon Neptune Bulk Loader to load data from an Amazon Simple Storage Service (Amazon S3) bucket. However, some of the requests fail.

Short description

To troubleshoot data requests that keep failing, check the status of each job. Then, complete the following steps to identify the failed jobs:

  • Use the default Bulk Loader API for each individual load and check each job's status.
  • Use an admin script and an automated script in one job. Create and run the automated script on a Linux or Unix system.

Before you start, review these limitations:

  • The Neptune Bulk Loader API doesn't provide a snapshot view of all load operations.
  • If AWS Identity and Access Management (IAM) authorization is active on the Neptune cluster, then the requests to the Bulk Load API must be signed.
  • The Bulk Loader API caches information only on the last 1,024 load jobs. It stores error details for the last 10,000 errors per job.

Resolution

Use the default Bulk Loader API

  1. Retrieve the loader IDs.

    $ curl -G  'https://neptunedemo-cluster.cluster-cw7ehemc1eeo.us-east-1.neptune.amazonaws.com:8182/loader'|jq{
      "status": "200 OK",
      "payload": {
        "loadIds": [
          "c32bbd24-99a7-45ee-972c-21b7b9cab3e2",
          "6f6342fb-4ea3-452c-ac69-b4d117e37d5a",
          "647114a6-6ed4-4018-896c-e84a08fcf864",
          "521d33fa-7050-44d7-a961-b64ef4e2d1db",
          "d0d4714e-7cf8-415e-89f5-d07ed2732bf2"
        ]
      }
    }
  2. Check each job's status to verify that the job was successful.

    curl -G 'https://neptunedemo-cluster.cluster-cw7ehemc1eeo.us-east-1.neptune.amazonaws.com:8182/loader/c32bbd24-99a7-45ee-972c-21b7b9cab3e2?details=true&errors=true&page=1&errorsPerPage=3'|jq{
      "status": "200 OK",
      "payload": {
        "feedCount": [
          {
            "LOAD_COMPLETED": 2
          }
        ],
        "overallStatus": {
          "fullUri": "s3://demodata/neptune/",
          "runNumber": 5,
          "retryNumber": 0,
          "status": "LOAD_COMPLETED",
          "totalTimeSpent": 3,
          "startTime": 1555574461,
          "totalRecords": 8,
          "totalDuplicates": 8,
          "parsingErrors": 0,
          "datatypeMismatchErrors": 0,
          "insertErrors": 0
        },
        "errors": {
          "startIndex": 0,
          "endIndex": 0,
          "loadId": "c32bbd24-99a7-45ee-972c-21b7b9cab3e2",
          "errorLogs": []
        }
      }
    }

Use an admin script

Use an admin script to identify a failed Neptune Bulk Loader job in your production process. The admin script generates an output in the following format for all load jobs:

Startime-loadid:status,S3location,Errors

Note: Use the admin script from a Linux system that has access to the Neptune cluster.

Create and run the automated script on a Linux or Unix system

  1. Create the script using a text editor.

    $ vi script
  2. Be sure that you replace cluster-endpoint:Port with your own endpoint and port.

    cluster_ep="https://cluster-endpoint:Port/loader"
    for loadId in $(curl --silent -G "${cluster_ep}?details=true" | jq '.payload.loadIds[]');
    do
            clean_loadId=$(echo -n ${loadId} | tr -d '"')
            time=$(date -d@$(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.startTime'))
            echo -n $time '-'
            echo -n ${clean_loadId}: $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.status')
            echo -n ',S3 LOCATION': $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.fullUri')
            echo -n ',ERRORS': $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=truei&errors=true&page=1&errorsPerPage=3" | jq '.payload.errors.errorLogs')
    
            echo
    done
  3. Save the script, and then provide permissions for the script to run.

    chmod +x script
  4. Install the dependent library.

    sudo yum install jq
  5. Run the script.

    $ ./script

    The following result is an example of the output.

    Thu Apr 18 08:01:01 UTC 2019 -c32bbd24-99a7-45ee-972c-21b7b9cab3e2: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
    Fri Apr 5 07:04:00 UTC 2019 -6f6342fb-4ea3-452c-ac69-b4d117e37d5a: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
    Fri Apr 5 07:01:30 UTC 2019 -647114a6-6ed4-4018-896c-e84a08fcf864: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
    Tue Mar 19 17:36:02 UTC 2019 -521d33fa-7050-44d7-a961-b64ef4e2d1db: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
    Tue Mar 19 17:35:45 UTC 2019 -d0d4714e-7cf8-415e-89f5-d07ed2732bf2: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null

Related information

Example: Loading data into a Neptune DB instance

Neptune Loader Get-Status API

AWS OFFICIAL
AWS OFFICIALUpdated 10 months ago