Unable to perform OpenSearch text queries from Gremlin using AWS Lambda written in Javascript
I am syncing my AWS Neptune nodes in an AWS OpenSearch cluster as per the documentation https://docs.aws.amazon.com/neptune/latest/userguide/full-text-search.html. The name of the OpenSearch index is amazon_neptune. The OpenSearch index type is _doc. Following is the index configuration
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"analysis": {
"normalizer": {
"useLowercase": {
"type": "custom",
"filter": "lowercase"
}
}
}
},
"mappings": {
"properties": {
"document_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"entity_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"entity_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"predicates": {
"properties": {
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above" : 1000,
"normalizer": "useLowercase"
}
}
},
"visibilityType": { "type": "keyword" },
"status": { "type": "keyword" },
"type": { "type": "keyword" },
"firstName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "useLowercase"
}
}
},
"lastName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "useLowercase",
"ignore_above" : 1000
}
}
}
}
}
}
}
}
Using the npm gremlin package, I'm trying to query my documents. Following is the code:
'use strict';
const gremlin = require('gremlin');
exports.handler = async (event, context) => {
try {
const DriverRemoteConnection = gremlin.driver.DriverRemoteConnection;
const Graph = gremlin.structure.Graph;
const dc = new DriverRemoteConnection(<neptune_endpoint>,{});
const graph = new Graph();
const dbClient = graph.traversal().withRemote(dc);
const res = await dbClient
.withSideEffect("Neptune#fts.endpoint",<https_opensearch_endpoint>)
.withSideEffect('Neptune#fts.queryType', 'term')
.V().has("visibilityType","Neptune#fts PUBLIC")
.toList();
console.log('res:', res);
} catch(err) {
console.error('Failed to query', err);
}
}
But I'm getting the following error
Failed to query ResponseError: Server error: {"detailedMessage":"method [POST], host [<https_opensearch_endpoint>], URI [/amazon_neptune/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 403 Forbidden]\n{\"Message\":\"User: anonymous is not authorized to perform: es:ESHttpPost\"}","requestId":"23a9e7d7-7dde-465b-bf29-9c59cff12e86","code":"BadRequestException"} (500)
I have given the following permission to my lambda
Type: AWS::IAM::Policy
Properties:
PolicyName: <Policy_Name>
Roles:
- 'Ref': <lambda_role>
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- es:ESHttpGet
- es:ESHttpPost
- es:ESHttpPut
- es:ESHttpDelete
Resource: <opensearch_cluster_arn>
My OpenSearch cluster as well as Neptune cluster are located inside the same VPC. My lambda is hosted inside the same VPC as well.
Please help me in understanding why I'm getting the 403 error when I've given the proper reading permissions to my lambda.
Any help would be highly appreciated.
Sorry, I was looking for your comment under the threads of my earlier comments. Under
Fine-grained access control
,Enabled
value isfalse
.Following is the AccessPolicy that I have configured in my OpenSearch cluster
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": "es:*", "Resource": "arn:aws:es:<region>:<account_number>:domain/<domain_name>/*" } ] }
As I mentioned in our other comments thread, following is the definition of the
dbClient
Following is the definition of the dbClient const gremlin = require('gremlin'); const DriverRemoteConnection = gremlin.driver.DriverRemoteConnection; const Graph = gremlin.structure.Graph; const dc = new DriverRemoteConnection(`wss://<neptune_endpoint>:8182/gremlin`,{}); const graph = new Graph(); const dbClient = graph.traversal().withRemote(dc);
I looked at your document in https://pastebin.com/qjT05w3p and I think the document_type field is not correctly set. If you change the value of document_type as "vertex"
, it should work as expected. Let us know if that doesn't work for you.
You're a life saver @awsprashu. I am getting the response now. Just one small follow up. My gremlin knowledge is a bit low. I'm getting only this as a response -
[{"id":<vertex_id>,"label":"POST"}]
. The output contains the id and label of the vertex and not it's properties. What's the syntax for getting the vertex id as well as the vertex properties if not using OpenCypher? Following is my queryawait dbClient .withSideEffect('Neptune#fts.endpoint',<os_endpoint>) .withSideEffect('Neptune#fts.queryType', 'match') .V().has("content","Neptune#fts post").toList();
Neptune IAM + OpenSearch Domain-Level Access
If using IAM Auth on Neptune with Domain-Level Access Control on OpenSearch, be sure to include the IAM role used to authenticate against Neptune in the Access Policy for your OpenSearch cluster. For example, if using the following IAM role to authenticate with Neptune: arn:aws:iam::0123456789012:role/myNeptuneRole
But sure to then include the same role in the Domain Access Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::0123456789012:role/myNeptuneRole"
},
"Action": "es:*",
"Resource": "arn:aws:es:us-east-2:0123456789012:domain/neptune-fts-pg/*"
}
]
}
Neptune No-IAM + OpenSearch Domain-Level Access
If not using IAM Auth on Neptune with Domain-Level Access Control on OpenSearch, then you'll need an open Access Policy in OpenSearch to allow the FTS requests to succeed:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "es:*",
"Resource": "arn:aws:es:us-east-2:0123456789012:domain/neptune-fts-pg/*"
}
]
}
Thanks for the solution Taylor. My query is not throwing the 403 error now. But my query is not giving back the required results.
I have tried different types of queries
const res = await dbClient .withSideEffect("Neptune#fts.endpoint",<opensearch_vpc_endpoint>) .V().has("content","Neptune#fts com~").values('content') .toList();
const res = await dbClient .withSideEffect("Neptune#fts.endpoint",<opensearch_vpc_endpoint>) .withSideEffect('Neptune#fts.queryType', 'term') .V().has('status','Neptune#fts ACTIVE') .toList();
In all the cases, the value of
res
is[]
. I don't know what I'm doing wrong.I tried the same
term
query, i.e,{ query: { term: { 'predicates.status.value': 'ACTIVE' } } }
on the Opensearch cluster, and it returned the document that I was looking for.Is data replicating to your OpenSearch cluster? Are you using the polling framework explained here (https://docs.aws.amazon.com/neptune/latest/userguide/full-text-search-cfn-create.html) that uses Neptune Streams? Can you please check your OS cluster directly to see if it has data? You can access it by just using
curl "https://<os_cluster_endpoint>/amazon_neptune/_search?q=<search_query>"
? That would validate that the data is in OS and the query is working from that end.Yes, the data is going to the OS cluster. But I'm not using the Neptune streams to replicate the data. I'm using a combination of FIFO SNS and SQS to trigger another lambda to write the data to OS cluster
I did query the inside the cluster using the npm
elasticsearch
package using the APIconst res = await esClient.search({ index: 'amazon_neptune', body: { query: { term: { 'predicates.status.value': 'ACTIVE' } } } });
. This is giving me back the required document in the response.I feel there is something wrong with my gremlin query. The name of my index is "amazon_neptune"
Looking at your index settings, you appear to be missing a few things. Here is a copy of the index settings that get auto-generated from the Neptune Streams poller. This index applies to the movie dataset that we use in our workshops, but you can use this as a template to copy for your own purposes. https://pastebin.com/akVRPd8u
I created the index mapping according to the pastebin link. My new index mapping - https://pastebin.com/6BQTWd6i. My document inside the index - https://pastebin.com/qjT05w3p
I tried the OpenSearch query
{ query: { match: { 'predicates.status.value': 'ACTIVE' } } }
from the npmelasticsearch
client which gives me back the above document. But when I write this lineconst res = await dbClient .withSideEffect('Neptune#fts.endpoint',<os_endpoint>) .withSideEffect('Neptune#fts.queryType', 'match') .V().has("status","Neptune#fts ACTIVE").toList();
res
is[]
Relevant questions
Access to AWS Opensearch in a VPC
Accepted Answerasked 2 months agoAWS Opensearch - stuck in processing
asked 7 months agoLog Subscription Filter To Opensearch
asked 7 months agoUnable to perform OpenSearch text queries from Gremlin using AWS Lambda written in Javascript
Accepted Answerasked a month agoSupport for Opensearch queries for Neptune nodes using OpenCypher
asked a month agoAWS Opensearch cluster is in "processing"
Accepted Answerasked 5 months agoAWS DMS + OpenSearch + Index templates
asked 5 months agoAWS DMS Postgres to OpenSearch LOB handling
asked 6 months agoAWS Opensearch Service (1.1) with email alerting (AWS SNS)
asked 6 months agoAWS CloudWatch metrics to OpenSearch
asked 3 months ago
How have you configured security on your OpenSearch cluster? Are you using Fine-Grained Access Control or Domain-Level Access Control?