Vertex Dropping Issues

0

Hello,

I have 1 million + vertices in Neptune with the following property: ('__graph','db8db2af-8b0c-49fb-bcf1-c44cae2d2e95')

I want to purge all 1 million vertices in the database. I know i'll have to drop the vertices in small chunks, otherwise the query will time out. Each chunk will be responsible for deleting 1000 vertices, which takes about 2 seconds normally. This is what the query looks like for a chunk:

g.V().has('__graph','db8db2af-8b0c-49fb-bcf1-c44cae2d2e95').limit(1000).drop()

I'm going to show you what the profile looks like after performing the operation twice:

gremlin> g.V().has('__graph','db8db2af-8b0c-49fb-bcf1-c44cae2d2e95').limit(1000).drop().profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur

NeptuneGraphQueryStep(Vertex) 1 1 2020.007 99.99
NeptuneDropStep 0.101 0.01
>TOTAL - - 2020.109 -


gremlin> g.V().has('__graph','db8db2af-8b0c-49fb-bcf1-c44cae2d2e95').limit(1000).drop().profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur

NeptuneGraphQueryStep(Vertex) 1 1 57883.392 100.00
NeptuneDropStep 0.130 0.00
>TOTAL - - 57883.522 -

Why is dropping so inconsistent? It typically should take 2000ms, however we are seeing large spikes in time like the second example. A lot of the time the query completely times out after 2 minutes.

Please help,
Thanks,
Austin

Edited by: austinmalpede on Apr 3, 2020 10:59 AM

asked 4 years ago860 views
1 Answer
0
Accepted Answer

Hi Austin,

For each of the vertices that you are dropping, is there a difference in the number of edges connecting to them? When vertices are dropped in Neptune, any connecting edges have to be dropped as well. Edges cannot exists without vertices at either end.

We have a example implementation for a multi-threaded drop that maybe of use. It is written in Python, but you can port this to any language that supports some sort of threading/multiprocessing library: https://github.com/awslabs/amazon-neptune-tools/blob/master/drop-graph/drop-graph.py

The example's purpose is to drop an entire graph, but you could extend this to only drop a certain subset of a graph. To make this performant, the example fetches and deletes edges first as to not incur the additional penalty of querying for and deleting edges when deleting a vertex.

profile pictureAWS
answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions