AWS Neptune Load is not Distributed evenly

We are using 3 readers and using the reader endpoints in the application, but the cpu of only one reader is reaching to max whereas other two readers has the cpu of less than 40, the load is not distributed evenly

rePost-User-6676408
a year ago
Max and MIN connection pooling size we have given is 20 and 100 only, while we have tried using both gremlin-client classes

NeptuneGremlinClusterBuilder and GremlinClusterBuilder

We are passing all the reader with the cluster but still the issue is same.

We have there is one more method with is taking input of endpoint networkLoadBalancerEndpoint(networkLoadBalancerEndpoint) not sure how to configure this or from where we will get this endpoint as of now. Still trying to debug and find the solution.

Topics

Database DevOps

Tags

Amazon Neptune DevOps

Language

English

rePost-User-6676408

asked a year ago296 views

3 Answers

Newest
Most votes
Most comments

Are these answers helpful? Upvote the correct answer to help the community benefit from your knowledge.

Can you post the code that you use for configuring the NeptuneGremlinClusterBuilder? Thank you

AWS-ISR

answered a year ago

rePost-User-6676408
a year ago
GremlinCluster readCluster = NeptuneGremlinClusterBuilder.build() .port(neptuneProperties.getReader().getPort()) .enableSsl(neptuneProperties.isEnableSSL()) .maxContentLength(NeptuneConstants.NEPTUNE_MAX_CONTENT_LENGTH) .maxConnectionPoolSize(neptuneProperties.getReader().getMaxConnectionPoolSize()) .minConnectionPoolSize(neptuneProperties.getReader().getMinConnectionPoolSize()) .maxSimultaneousUsagePerConnection(neptuneProperties.getReader().getMaxSimultaneousUsagePerConnection()) .minSimultaneousUsagePerConnection(neptuneProperties.getReader().getMinSimultaneousUsagePerConnection()) .maxInProcessPerConnection(neptuneProperties.getReader().getMaxInProcessPerConnection()) .minInProcessPerConnection(neptuneProperties.getReader().getMinInProcessPerConnection()) .addContactPoints(refreshAgent.getAddresses().get(endpointsSelector)) .create(); GremlinClient client = readCluster.connect(); refreshAgent.startPollingNeptuneAPI( (OnNewAddresses) addresses -> client.refreshEndpoints(addresses.get(endpointsSelector)), 60, TimeUnit.SECONDS); DriverRemoteConnection connection = DriverRemoteConnection.using(client); GraphTraversalSource g = AnonymousTraversalSource.traversal().withRemote(connection);
rePost-User-6676408
a year ago
When we removed all the extra propeerties as shown below the distribution of load on readReplicas were fine (used all the default properties) GremlinCluster readCluster = NeptuneGremlinClusterBuilder.build() .port(neptuneProperties.getReader().getPort()) .enableSsl(neptuneProperties.isEnableSSL()) .addContactPoints(refreshAgent.getAddresses().get(endpointsSelector)) .create();

While we added content length property to increase it started failing the distribution of load to multiple replicas GremlinCluster readCluster = NeptuneGremlinClusterBuilder.build() .port(neptuneProperties.getReader().getPort()) .enableSsl(neptuneProperties.isEnableSSL()) .maxContentLength(NeptuneConstants.NEPTUNE_MAX_CONTENT_LENGTH) .addContactPoints(refreshAgent.getAddresses().get(endpointsSelector)) .create();

We didnt find any issue or anything mentioned that this should make an issue but as per our analysis that is the cause, any suggesstion would be helpful.

The reader endpoint distributes connections, not individual requests, by changing the instance it points to every 5 seconds. During each 5 second window, every new connection will be directed to the same instance. Large connection pools that are opened eagerly will, therefore, likely attach most of the connections in the pool to a single instance. And because these are long-lived Websocket connections, they will continue forwarding traffic to the same instance for the duration of the connection.

More details on this, and some recommendations for mitigating are described here: https://aws.amazon.com/blogs/database/load-balance-graph-queries-using-the-amazon-neptune-gremlin-client/

AWS-ISR

answered a year ago

maxContentLength shouldn't have any impact on the endpoint selection logic – it's used purely to configure the frame size for responses from the server.

One more question: what endpoiuntSelector are you using? Is it an EndpoiuntsType enum value (and if so, which one)? Or is it a custom selector? Thanks

AWS-ISR

answered a year ago

Relevant content

Is Aurora sure distribution Reader request across the entire Reader instance?
Accepted Answer
iamnick
asked 2 years ago
Cross region load balancing on Amazon Aurora Global Database using reader endpoint
Accepted Answer
rePost-User-6618115
asked 9 months ago
Using Reader Instance in Aurora MYSQL
Accepted Answer
rePost-User-7704228
asked a year ago
ElastiCache for Redis single node from Reader endpoint
Piotr
asked a year ago
How does Aurora MySQL or PostgreSQL distribute workload between reader nodes?
AWS OFFICIALUpdated 3 months ago
What are the permissions required for AWS DMS when I use Oracle as the source endpoint?
AWS OFFICIALUpdated 6 months ago
Why is my connection redirected to the reader instance when I try to connect to my Amazon Aurora writer endpoint?
AWS OFFICIALUpdated a year ago
Why did my AWS DMS task fail when using Binary Reader for Amazon RDS for Oracle?
AWS OFFICIALUpdated a year ago
Should I use the AWS IoT Device SDKs or other MQTT Clients?
EXPERT
David Malone
published a year ago
Using AWS Application Load Balancer for Blue / Green Application Migration to VMware Cloud on AWS
EXPERT
Jay Scheponik
published 5 months ago