Questions tagged with Amazon Neptune

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Neptune and Cypher - Poor Query Performance

I am wanting to use Neptune for an application with cypher as my query language. I have a pretty small dataset of around ~8500 nodes and ~8500 edges edges. I am trying to do what seem to be fairly straightforward queries, but the latency is very high (~6-8 seconds for around 1000 rows). I have tried with various instance types, enabling and disabling caches, enabling and disabling the OSGP index to no avail. I'm really at a loss as to why the query performance is so poor. Does anyone have any experience with poor query query performance using Neptune? I feel I must be doing something incorrect to have such high query latency. Here is some more detailed information on my graph structure and my query. I have a graph with 2 node types `A` and `B` and a single edge type `MAPS_TO` which always is directed from an `A` node to a `B` node. The relation `MAPS_TO` is many to many, but with the current dataset it is primarily one-to-one, i.e. the graph is mainly disconnected subgraphs of the form: ``` (A)-[MAPS_TO]-(B) ``` What I would like to do is for all A nodes to collect the distinct B nodes which they map to satisfying some conditions. I've experimented with my queries a bit and the fastest one I've been able to arrive at is: ``` MATCH (a:A) WHERE a.Owner = $owner AND a.IsPublic = true WITH a MATCH (a)-[r:MAPS_TO]->(b:B) WHERE (b)<-[:MAPS_TO {CreationReason: "origin"}]-(:A {Owner: $owner}) OR (b)<-[:MAPS_TO {CreationReason: "origin"}]-(:A {IsPublic: true}) WITH a, r, b ORDER BY a.AId SKIP 0 LIMIT 1000 RETURN a { .AId } AS A, collect(distinct b { B: {BId: b.BId, Name: b.Name, other properties on B nodes...} R: {CreationReason: r.CreationReason, other relation properties} }) ``` The above query takes ~6 seconds (looking at explain and http timing) on the `t4g.medium` instance type. I tried upping to a `r5d.2xlarge` instance type and this cut the query time in half to 3-4 seconds. However, using such a large instance type seems quite excessive for such a small amount of data. Really I am just trying to figure out why my query seems to perform so poorly. It seems to me that with the amount of data I have it should not really be possible to have a Neptune configuration with such performance. EDIT for more info: We are using the `t4g.medium` instance type with 3 reader instances and the query going to the reader instances. Again we have around ~8500 nodes split approximately equally into `A` nodes and `B` nodes. There are around ~8500 edges of a single type `MAPS_TO` all going from `A` to `B`. The output of the status endpoint for OpenCypher is: ``` {'status': 'healthy', 'startTime': 'Mon Sep 19 18:56:50 UTC 2022', 'dbEngineVersion': '1.1.1.0.R5', 'role': 'reader', 'dfeQueryEngine': 'viaQueryHint', 'gremlin': {'version': 'tinkerpop-3.5.2'}, 'sparql': {'version': 'sparql-1.1'}, 'opencypher': {'version': 'Neptune-9.0.20190305-1.0'}, 'labMode': {'ObjectIndex': 'enabled', 'ReadWriteConflictDetection': 'enabled'}, 'features': {'ResultCache': {'status': 'enabled'}, 'IAMAuthentication': 'disabled', 'Streams': 'disabled', 'AuditLog': 'enabled'}, 'settings': {'clusterQueryTimeoutInMs': '120000'}} ``` I have tried this with the `ObjectIndex` enabled and disabled and do not see much difference in performance. I have also tried the query on a larger instance type, the `r5d.2xlarge` to see if performance was improved by the result cache. The response time roughly cut in half, but that still seems to be very slow and a larger instance type then should be necessary. The only thing being run against the database currently are the above queries so I do not see how it could be a concurrency issue. We have looked at the output of explain (too long to post). It is not clear to me that there is a single place where the query is spending a large amount of time. The `DFEPipelineJoin` taking the longest makes sense to me based on the description in the documentation. What is not clear to me would be how to eliminate all the `DFEPipelineJoin`'s from the query.
2
answers
0
votes
67
views
asked 2 months ago

Neptune Writer Instance does not recover freeable memory after successful load

Hi, I am experimenting with loading openCypher data from S3 (2 MB of node data, and about 12 MB of edge data) into a Neptune instance we have set up. I am using the %load line magic in a Neptune Notebook to perform the load. The loads are successful, but the freeable memory of our Writer instance (db.t3.medium) does not recover after successfully loading the data, which eventually leads to failed loads due to out-of-memory errors. These are the out-of-memory errors I get when loading additional data using the %load line magic (output from `%load_status <load-id> --details --errors`): ``` { "status": "200 OK", "payload": { "feedCount": [ { "LOAD_FAILED": 1 } ], "overallStatus": { "fullUri": "s3://<bucket_name>/neptune_ingest_test_data/node_df_neptune_2022_04_02.csv", "runNumber": 1, "retryNumber": 0, "status": "LOAD_FAILED", "totalTimeSpent": 7, "startTime": 1663677230, "totalRecords": 153958, "totalDuplicates": 0, "parsingErrors": 0, "datatypeMismatchErrors": 0, "insertErrors": 153958 }, "failedFeeds": [ { "fullUri": "s3://<bucket_name>/neptune_ingest_test_data/node_df_neptune_2022_04_02.csv", "runNumber": 1, "retryNumber": 0, "status": "LOAD_FAILED", "totalTimeSpent": 4, "startTime": 1663677233, "totalRecords": 153958, "totalDuplicates": 0, "parsingErrors": 0, "datatypeMismatchErrors": 0, "insertErrors": 153958 } ], "errors": { "startIndex": 1, "endIndex": 5, "loadId": "<load-id>", "errorLogs": [ { "errorCode": "OUT_OF_MEMORY_ERROR", "errorMessage": "Out of memory error. Resume load and try again.", "fileName": "s3://<bucket_name>/neptune_ingest_test_data/node_df_neptune_2022_04_02.csv", "recordNum": 0 }, { "errorCode": "OUT_OF_MEMORY_ERROR", "errorMessage": "Out of memory error. Resume load and try again.", "fileName": "s3://<bucket_name>/neptune_ingest_test_data/node_df_neptune_2022_04_02.csv", "recordNum": 0 }, { "errorCode": "OUT_OF_MEMORY_ERROR", "errorMessage": "Out of memory error. Resume load and try again.", "fileName": "s3://<bucket_name>/neptune_ingest_test_data/node_df_neptune_2022_04_02.csv", "recordNum": 0 }, { "errorCode": "OUT_OF_MEMORY_ERROR", "errorMessage": "Out of memory error. Resume load and try again.", "fileName": "s3://<bucket_name>/neptune_ingest_test_data/node_df_neptune_2022_04_02.csv", "recordNum": 0 }, { "errorCode": "OUT_OF_MEMORY_ERROR", "errorMessage": "Out of memory error. Resume load and try again.", "fileName": "s3://<bucket_name>/neptune_ingest_test_data/node_df_neptune_2022_04_02.csv", "recordNum": 0 } ] } } } ``` This is the freeable memory metric of the Writer instance at the time I recieved the out-of-memory errors above: ![Freeable Memory metric of the writer instance since the creation of the Neptune cluster](/media/postImages/original/IMAk3rt6AGTVaORgDjK3WjWw) After restarting the Writer Instance and loading some data from S3, the same starts to happen again. ![Freeable Memory metric of the writer instance after rebooting and loading some data](/media/postImages/original/IMCBz9oYabSdqhZJ74NRH0tQ)
1
answers
0
votes
61
views
kutschs
asked 2 months ago

How to use bolt protocol in java to directly execute cypher query in AWS Neptune Service

I am following the following article to query cypher query on the Neptune instance:- [Cypher Using Bolt](https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-opencypher-bolt.html) I want to execute the cypher query directly on the AWS Neptune Instance without translating it to the gremlin. Following is the error I am getting, despite following the code as shown in the documentation. """ Exception in thread "main" org.neo4j.driver.exceptions.ServiceUnavailableException: Failed to establish connection with the server at org.neo4j.driver.internal.util.Futures.blockingGet(Futures.java:143) Caused by: org.neo4j.driver.internal.shaded.io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 485454502f312e31203430302042616420526571756573740d0a5365727665723a20617773656c622f322e30 (trimmed) """ I am also putting the sample java code for your reference:- ``` public class TestImpl { private static final String ACCESS_KEY = "XYZ"; private static final String SECRET_KEY = "ABC"; private static final String SERVICE_REGION = "AAAA"; private static final Gson GSON = new Gson(); public static void main(String[] args) { String URL = "bolt://URL:PORT"; final Driver driver = GraphDatabase.driver(URL, AuthTokens.basic("username", getSignedHeader()), getDefaultConfig()); String query = "MATCH (ruleSet:RULE_SET) " + "WHERE ruleSet.refId = \"aws-iam-best-practices\" " + "RETURN ruleSet.refId as refId, ruleSet.name as name, collect(ruleSet.ruleIds) as ruleIds"; System.out.println(query); final Record rec = driver.session().run(query).list().get(0); System.out.println(rec.get("refId").asNode().toString()); } private static Config getDefaultConfig() { return Config.builder() .withConnectionTimeout(30, TimeUnit.SECONDS) .withMaxConnectionPoolSize(1000) .withDriverMetrics() .withLeakedSessionsLogging() .withEncryption() .withTrustStrategy(Config.TrustStrategy.trustSystemCertificates()) .build(); } private static String getSignedHeader() { // If you are using permanent credentials, use the BasicAWSCredentials access key and secret key final BasicAWSCredentials permanentCreds = new BasicAWSCredentials(ACCESS_KEY, SECRET_KEY); final AWSCredentialsProvider creds = new AWSStaticCredentialsProvider(permanentCreds); // Or, if you are using temporary credentials, use the BasicSessionCredentials to // pass the access key, secret key, and session token, like this: // final BasicSessionCredentials temporaryCredentials = new BasicSessionCredentials(ACCESS_KEY, SECRET_KEY, AWS_SESSION_TOKEN); // final AWSCredentialsProvider tempCreds = new AWSStaticCredentialsProvider(temporaryCredentials); String signedHeader = ""; final Request<Void> request = new DefaultRequest<Void>("neptune-db"); // Request to neptune request.setHttpMethod(HttpMethodName.GET); request.setEndpoint(URI.create("https://NeptuneServiceURL")); // Comment out the following line if you're using an engine version older than 1.2.0.0 request.setResourcePath("/openCypher"); final AWS4Signer signer = new AWS4Signer(); signer.setRegionName(SERVICE_REGION); signer.setServiceName(request.getServiceName()); signer.sign(request, creds.getCredentials()); signedHeader = getAuthInfoJson(request); return signedHeader; } private static String getAuthInfoJson(final Request<Void> request) { final Map<String, Object> obj = new HashMap<>(); obj.put("Authorization", request.getHeaders().get("Authorization")); obj.put("HttpMethod", request.getHttpMethod()); obj.put("X-Amz-Date", request.getHeaders().get("X-Amz-Date")); obj.put("Host", request.getEndpoint().getHost()); // If temporary credentials are used, include the security token in // the request, like this: // obj.put("X-Amz-Security-Token", request.getHeaders().get("X-Amz-Security-Token")); final String json = GSON.toJson(obj); return json; } } ``` Please guide me on what is my mistake in this process. Thanking you in advance for it :).
1
answers
0
votes
71
views
asked 4 months ago

CDK - Connect a Network Load Balancer and a Neptune Cluster Endpoint together

For the past two days I've been struggling with exposing a Neptune endpoint to the public using an NLB **in a single stack**. The architecture was inspired by [this document](https://github.com/aws-samples/aws-dbs-refarch-graph/tree/master/src/connecting-using-a-load-balancer#connecting-to-amazon-neptune-from-clients-outside-the-neptune-vpc-using-aws-network-load-balancer). For the life of me I haven't been able to figure out how to obtain the IP address of the Neptune endpoint to use as the target of NLB's listener. The main issue resides in the conversion of the Neptune `hostname` to an IP address as required by NLB's target group `IPTarget` and how CDK synthesizes stacks before deployment. I explored the use of CustomResources to no avail due to my limited familiarity with the topic (day 5 of my aws journey), and was hoping someone could point me in the right direction. Here's my stack (CDK app repo [here](https://github.com/neuxregime/cdk-neptune-nlb)): ```js import { Construct } from "constructs"; import { Stack } from "aws-cdk-lib"; import * as ec2 from "aws-cdk-lib/aws-ec2"; import * as elbv2 from "aws-cdk-lib/aws-elasticloadbalancingv2"; import * as neptune from "@aws-cdk/aws-neptune-alpha"; import { Props } from "../../_config"; import createVPC from "../helpers/createVPC"; import createNeptuneCluster from "../helpers/createNeptuneCluster"; import createNLB from "../helpers/createNLB"; export class ABCGraphStack extends Stack { public readonly vpc: ec2.Vpc; public readonly subnets: { public: ec2.ISubnet[]; private: ec2.ISubnet[]; isolated: ec2.ISubnet[]; }; public readonly neptuneCluster: neptune.DatabaseCluster; public readonly neptuneReadEndpoint: neptune.Endpoint; public readonly neptuneWriteEndpoint: neptune.Endpoint; public readonly nlb: elbv2.NetworkLoadBalancer; constructor(scope: Construct, id: string, props: Props) { super(scope, id, props); // Create VPC for use with Neptune const { vpc, subnets } = createVPC(props, this); this.vpc = vpc; this.subnets = subnets; // Create Neptune Cluster this.neptuneCluster = createNeptuneCluster( props, this, this.vpc, this.subnets ); // Update Neptune Security Group to allow-all-in this.neptuneCluster.connections.allowDefaultPortFromAnyIpv4( "Allow All Inbound to Neptune" ); // Add an ordering dependency on VPC. this.neptuneCluster.node.addDependency(this.vpc); // Output the Neptune read/write addresses this.neptuneReadEndpoint = this.neptuneCluster.clusterReadEndpoint; this.neptuneWriteEndpoint = this.neptuneCluster.clusterEndpoint; // HOW TO GET IP ADDRESS OF this.neptuneWriteEndpoint.hostname? // Create Network Load Balancer this.nlb = createNLB(props, this, this.vpc, "????????", 8182); this.nlb.node.addDependency(this.neptuneCluster); } } ```
1
answers
0
votes
68
views
asked 5 months ago