Vertex Dropping Issues

0

Hello,

I have 1 million + vertices in Neptune with the following property: ('__graph','db8db2af-8b0c-49fb-bcf1-c44cae2d2e95')

I want to purge all 1 million vertices in the database. I know i'll have to drop the vertices in small chunks, otherwise the query will time out. Each chunk will be responsible for deleting 1000 vertices, which takes about 2 seconds normally. This is what the query looks like for a chunk:

g.V().has('__graph','db8db2af-8b0c-49fb-bcf1-c44cae2d2e95').limit(1000).drop()

I'm going to show you what the profile looks like after performing the operation twice:

gremlin> g.V().has('__graph','db8db2af-8b0c-49fb-bcf1-c44cae2d2e95').limit(1000).drop().profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur

NeptuneGraphQueryStep(Vertex) 1 1 2020.007 99.99
NeptuneDropStep 0.101 0.01
>TOTAL - - 2020.109 -


gremlin> g.V().has('__graph','db8db2af-8b0c-49fb-bcf1-c44cae2d2e95').limit(1000).drop().profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur

NeptuneGraphQueryStep(Vertex) 1 1 57883.392 100.00
NeptuneDropStep 0.130 0.00
>TOTAL - - 57883.522 -

Why is dropping so inconsistent? It typically should take 2000ms, however we are seeing large spikes in time like the second example. A lot of the time the query completely times out after 2 minutes.

Please help,
Thanks,
Austin

Edited by: austinmalpede on Apr 3, 2020 10:59 AM

질문됨 4년 전888회 조회
1개 답변
0
수락된 답변

Hi Austin,

For each of the vertices that you are dropping, is there a difference in the number of edges connecting to them? When vertices are dropped in Neptune, any connecting edges have to be dropped as well. Edges cannot exists without vertices at either end.

We have a example implementation for a multi-threaded drop that maybe of use. It is written in Python, but you can port this to any language that supports some sort of threading/multiprocessing library: https://github.com/awslabs/amazon-neptune-tools/blob/master/drop-graph/drop-graph.py

The example's purpose is to drop an entire graph, but you could extend this to only drop a certain subset of a graph. To make this performant, the example fetches and deletes edges first as to not incur the additional penalty of querying for and deleting edges when deleting a vertex.

profile pictureAWS
답변함 4년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠