Neptune Performance

0

We use Neo4j AuraDB for our graph database but there we have issues with data upload. So, we decided to move to AWS Neptune using the migration tool.

We have 3.7M nodes and 11.2M relations in our database. The DB instance is db.r5.large with 2 CPUs and 16GiB RAM.

The same AWS Neptune OpenCypher queries are much slower than AuraDB Cypher queries (about 7-10 times slower). Also, we tried to rewrite the queries to Gremlin and test performance but it is still very slow. We have node and lookup indexes on AuraDB but we can't create them on AWS Neptune as it handles them automatically.

Is there any way to reach better performance on AWS Neptune?

  • It may be beneficial to understand what sort of problem you're trying to solve to see if Neptune can optimally handle it. Can you provide an example graph structure and similar query that is giving you issues?

  • F. e., we have Member and Token nodes and there is a HAS relation between them. We need to find the top 20 other members who have the same tokens as the specific member.

    MATCH (member:Member { address: '${address}' })-[:HAS]->(token:Token)<-[:HAS]-(other_member:Member) RETURN PROPERTIES(other_member) as member, COUNT(token) AS number_of_tokens ORDER BY number_of_tokens DESC LIMIT 20

    This query in AuraDB takes milliseconds but seconds on Neptune. The same Gremlin query is faster than OpenCypher but still slower than in AuraDB. We also have many complex queries where the difference is much more significant.

  • Here is also the same query using Gremlin: g.V().hasLabel('Member').has('address', eq('${address}')).outE('HAS').as('member_has').inV().as('token').hasLabel('Token').inE('HAS').as('other_member_has').outV().as('other_member').hasLabel('Member').where(__.select('member_has').where(neq('other_member_has'))).select('other_member', 'token').group().by(__.select('other_member').local(__.properties().group().by(__.key()).by(__.map(__.value())))).by(__.fold().project('member', 'number_of_tokens').by(__.unfold().select('other_member').choose(neq('cypher.null'), __.local(__.properties().group().by(__.key()).by(__.map(__.value()))))).by(__.unfold().select('token').count())).unfold().select(values).order().by(__.select('number_of_tokens'), desc).limit(20)

1 Answer
1
Accepted Answer

Also answered on Stack Overflow

As of this moment, the openCypher support is a preview, not quite GA level. The more recent engine versions do have some significant improvements, but more are yet to be delivered.

As to the Gremlin query, tools that convert Cypher to Gremlin tend to build quite complex queries. I think the Gremlin equivalent to the Cypher query is going to look something like this.

g.V().has('Member','address', address).as('m').
      out('HAS').hasLabel('Token').as('t').
      in('HAS').hasLabel('Member').as('om').
      where(neq('m')).
      group().
        by('om').
        by(select('t').count()).
      order(local).
        by(values,desc).
      limit(20) 

and if you want all of the properties just add a valueMap as in:

g.V().has('Member','address', address).as('m').
      out('HAS').hasLabel('Token').as('t').
      in('HAS').hasLabel('Member').as('om').
      where(neq('m')).
      group().
        by(select('om').valueMap(true)).
        by(select('t').count()).
      order(local).
        by(values,desc).
      limit(20) 
AWS
AWS-KRL
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions