AWS Neptune Vector Sampling OOM

Question

I was trying gremlin random walk algorithms while just trying to sample one node and was hitting OOM, so I decided to just reduce to `g.V().sample(1)`, and I'm still hitting OOM even on that simple operation. The DB itself is large, with 200m relationships and 45m vectors, but I'm surprised at this performance, any idea how I could trace the issue?

Accepted Answer

Hello and thanks for the question. The ApacheTinkerPop Gremlin `sample` step currently uses an implementation that reads data into memory before taking a sample from it. That may change in the future, but for now are you able to put a limit in front of the step, so perhaps sample from something like `limit(50000)` ?

Depending upon the instance size being used, a larger instance type may sufficiently increase the memory enough for the whole `sample` to complete, but perhaps try with different `limit` sizes as a short term way to get at least some results.

So `g.V().limit(50000).sample(1)`

AWS Neptune Vector Sampling OOM

Contenuto pertinente