What is the most efficient way to Order data in Neptune database ?

0

I have noticed that Ordering data slows down performance sometimes from 300ms to 30,000ms when ordering data, for example in gremlin :

g.V().hasLabel('user').range(1,20)

would be much faster than

g.V().hasLabel('user').order().by('out('orderForStore_14').values('avgOrderValue')).range(1,20)

Queries we have are more complex than this as we filter out more things based on connections but it the same issue as the example above.

I know that when ordering data we have to fetch all the data first then order them then only select the range, where as without ordering we just get the first items in the range in whatever order they are in. thats why its way faster. But how do we handle such issue ? What is the most efficient way to do it ?

asked a year ago524 views
1 Answer
0

Neptune already builds indices on every vertex, edge, and property in the graph [1]. That being said, those indices are not used for order().by() operations. Presently, order().by() operations require fetching the resultant vertices/edges, materializing the properties that you want to serialize out to the client, and then performing the ordering on the worker thread that is executing the query. So if the customer is fetching many results ahead of the order().by().limit(), the query is going to take a long time to execute.

Alternatives to doing largeorder().by() operations:

Fetch all of the results and perform the ordering in the app layer. As you mention above, it is much faster just to fetch the results vs. fetching the results and performing the ordering in the query. Build constructs into the data model to curtail the number of objects that need to be ordered. If ordering by date, you could create intermediate vertices for the lower cardinality constructs of a date (year, month, day). Then build a traversal that can take advantage of these vertices to limit what needs to be fetched before ordering. If ordering is a common task, they may want to leverage an external store that maintains ordering. It is not uncommon for customers to build an architecture leveraging Neptune Streams to push to Elasticsearch (and leveraging Neptune's Elasticsearch integration [2] with ordering) or using a sorted set in ElastiCache Redis (materializing the results as they are written to Neptune and building the sorted set from the changes as they are exposed via Neptune Streams) [3]. (NDA/roadmap) Longer-term, the Neptune team is working on improvements for property value materialization, which should have a positive effect on ordering. There is presently no defined timeline for these changes, but likely that they would begin to appear mid-to-late 2021 at the earliest.

[1] https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-data-model.html [2] https://docs.aws.amazon.com/neptune/latest/userguide/full-text-search.html [3] https://aws.amazon.com/blogs/database/capture-graph-changes-using-neptune-streams/

profile pictureAWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions