Unable to perform OpenSearch text queries from Gremlin using AWS Lambda written in Javascript

0

I am syncing my AWS Neptune nodes in an AWS OpenSearch cluster as per the documentation https://docs.aws.amazon.com/neptune/latest/userguide/full-text-search.html. The name of the OpenSearch index is amazon_neptune. The OpenSearch index type is _doc. Following is the index configuration

{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1,
    "analysis": {
      "normalizer": {
        "useLowercase": {
          "type": "custom",
          "filter": "lowercase"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "document_type" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "entity_id" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "entity_type" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "predicates": {
        "properties": {
          "content": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above" : 1000,
                "normalizer": "useLowercase"
              }
            }
          },
          "visibilityType": { "type": "keyword" },
          "status": { "type": "keyword" },
          "type": { "type": "keyword" },
          "firstName": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "normalizer": "useLowercase"
              }
            }
          },
          "lastName": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "normalizer": "useLowercase",
                "ignore_above" : 1000
              }
            }
          }
        }
      }
    }
  }
}

Using the npm gremlin package, I'm trying to query my documents. Following is the code:

'use strict';

const gremlin = require('gremlin');

exports.handler = async (event, context) => {
  try {
    const DriverRemoteConnection = gremlin.driver.DriverRemoteConnection;
    const Graph = gremlin.structure.Graph;
    const dc = new DriverRemoteConnection(<neptune_endpoint>,{});
    const graph = new Graph();
    const dbClient = graph.traversal().withRemote(dc);

    const res = await dbClient
      .withSideEffect("Neptune#fts.endpoint",<https_opensearch_endpoint>)
      .withSideEffect('Neptune#fts.queryType', 'term')
      .V().has("visibilityType","Neptune#fts PUBLIC")
      .toList();
    console.log('res:', res);
  } catch(err) {
    console.error('Failed to query', err);
  }
}

But I'm getting the following error

Failed to query ResponseError: Server error: {"detailedMessage":"method [POST], host [<https_opensearch_endpoint>], URI [/amazon_neptune/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 403 Forbidden]\n{\"Message\":\"User: anonymous is not authorized to perform: es:ESHttpPost\"}","requestId":"23a9e7d7-7dde-465b-bf29-9c59cff12e86","code":"BadRequestException"} (500)

I have given the following permission to my lambda

Type: AWS::IAM::Policy
Properties:
  PolicyName: <Policy_Name>
  Roles:
    - 'Ref': <lambda_role>
  PolicyDocument:
    Version: '2012-10-17'
    Statement:
      - Effect: Allow
        Action:
          - es:ESHttpGet
          - es:ESHttpPost
          - es:ESHttpPut
          - es:ESHttpDelete
        Resource: <opensearch_cluster_arn>

My OpenSearch cluster as well as Neptune cluster are located inside the same VPC. My lambda is hosted inside the same VPC as well.

Please help me in understanding why I'm getting the 403 error when I've given the proper reading permissions to my lambda.

Any help would be highly appreciated.

  • How have you configured security on your OpenSearch cluster? Are you using Fine-Grained Access Control or Domain-Level Access Control?

  • Sorry, I was looking for your comment under the threads of my earlier comments. Under Fine-grained access control, Enabled value is false.

    Following is the AccessPolicy that I have configured in my OpenSearch cluster

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "*"
          },
          "Action": "es:*",
          "Resource": "arn:aws:es:<region>:<account_number>:domain/<domain_name>/*"
        }
      ]
    }
    
  • As I mentioned in our other comments thread, following is the definition of the dbClient

    Following is the definition of the dbClient
    const gremlin = require('gremlin');
    const DriverRemoteConnection = gremlin.driver.DriverRemoteConnection;
    const Graph = gremlin.structure.Graph;
    const dc = new DriverRemoteConnection(`wss://<neptune_endpoint>:8182/gremlin`,{});
    const graph = new Graph();
    const dbClient = graph.traversal().withRemote(dc);
    
2 Answers
1
Accepted Answer

I looked at your document in https://pastebin.com/qjT05w3p and I think the document_type field is not correctly set. If you change the value of document_type as "vertex", it should work as expected. Let us know if that doesn't work for you.

answered 2 years ago
  • You're a life saver @awsprashu. I am getting the response now. Just one small follow up. My gremlin knowledge is a bit low. I'm getting only this as a response - [{"id":<vertex_id>,"label":"POST"}]. The output contains the id and label of the vertex and not it's properties. What's the syntax for getting the vertex id as well as the vertex properties if not using OpenCypher? Following is my query

    await dbClient
    .withSideEffect('Neptune#fts.endpoint',<os_endpoint>)
    .withSideEffect('Neptune#fts.queryType', 'match')
    .V().has("content","Neptune#fts post").toList();
    
1

Neptune IAM + OpenSearch Domain-Level Access

If using IAM Auth on Neptune with Domain-Level Access Control on OpenSearch, be sure to include the IAM role used to authenticate against Neptune in the Access Policy for your OpenSearch cluster. For example, if using the following IAM role to authenticate with Neptune: arn:aws:iam::0123456789012:role/myNeptuneRole

But sure to then include the same role in the Domain Access Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::0123456789012:role/myNeptuneRole"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:us-east-2:0123456789012:domain/neptune-fts-pg/*"
    }
  ]
}

Neptune No-IAM + OpenSearch Domain-Level Access

If not using IAM Auth on Neptune with Domain-Level Access Control on OpenSearch, then you'll need an open Access Policy in OpenSearch to allow the FTS requests to succeed:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:us-east-2:0123456789012:domain/neptune-fts-pg/*"
    }
  ]
}
profile pictureAWS
answered 2 years ago
AWS
EXPERT
reviewed 2 years ago
  • Thanks for the solution Taylor. My query is not throwing the 403 error now. But my query is not giving back the required results.

    I have tried different types of queries

    const res = await dbClient
          .withSideEffect("Neptune#fts.endpoint",<opensearch_vpc_endpoint>)
          .V().has("content","Neptune#fts com~").values('content')
          .toList();
    
    const res = await dbClient
          .withSideEffect("Neptune#fts.endpoint",<opensearch_vpc_endpoint>)
          .withSideEffect('Neptune#fts.queryType', 'term')
          .V().has('status','Neptune#fts ACTIVE')
          .toList();
    

    In all the cases, the value of res is []. I don't know what I'm doing wrong.

    I tried the same term query, i.e, { query: { term: { 'predicates.status.value': 'ACTIVE' } } } on the Opensearch cluster, and it returned the document that I was looking for.

  • Is data replicating to your OpenSearch cluster? Are you using the polling framework explained here (https://docs.aws.amazon.com/neptune/latest/userguide/full-text-search-cfn-create.html) that uses Neptune Streams? Can you please check your OS cluster directly to see if it has data? You can access it by just using curl "https://<os_cluster_endpoint>/amazon_neptune/_search?q=<search_query>" ? That would validate that the data is in OS and the query is working from that end.

  • Yes, the data is going to the OS cluster. But I'm not using the Neptune streams to replicate the data. I'm using a combination of FIFO SNS and SQS to trigger another lambda to write the data to OS cluster

    I did query the inside the cluster using the npm elasticsearch package using the API const res = await esClient.search({ index: 'amazon_neptune', body: { query: { term: { 'predicates.status.value': 'ACTIVE' } } } });. This is giving me back the required document in the response.

    I feel there is something wrong with my gremlin query. The name of my index is "amazon_neptune"

  • Looking at your index settings, you appear to be missing a few things. Here is a copy of the index settings that get auto-generated from the Neptune Streams poller. This index applies to the movie dataset that we use in our workshops, but you can use this as a template to copy for your own purposes. https://pastebin.com/akVRPd8u

  • I created the index mapping according to the pastebin link. My new index mapping - https://pastebin.com/6BQTWd6i. My document inside the index - https://pastebin.com/qjT05w3p

    I tried the OpenSearch query { query: { match: { 'predicates.status.value': 'ACTIVE' } } } from the npm elasticsearch client which gives me back the above document. But when I write this line

    const res = await dbClient
          .withSideEffect('Neptune#fts.endpoint',<os_endpoint>)
          .withSideEffect('Neptune#fts.queryType', 'match')
          .V().has("status","Neptune#fts ACTIVE").toList();
    

    res is []

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions