AWS Neptune Vector Sampling OOM

I was trying gremlin random walk algorithms while just trying to sample one node and was hitting OOM, so I decided to just reduce to g.V().sample(1), and I'm still hitting OOM even on that simple operation. The DB itself is large, with 200m relationships and 45m vectors, but I'm surprised at this performance, any idea how I could trace the issue?

주제

데이터베이스 컴퓨팅

태그

아마존 Neptune 고성능 컴퓨팅

언어

English

MarkT

질문됨 2달 전106회 조회

1개 답변

최신
최다 투표
가장 많은 댓글

수락된 답변

Hello and thanks for the question. The ApacheTinkerPop Gremlin sample step currently uses an implementation that reads data into memory before taking a sample from it. That may change in the future, but for now are you able to put a limit in front of the step, so perhaps sample from something like limit(50000) ?

Depending upon the instance size being used, a larger instance type may sufficiently increase the memory enough for the whole sample to complete, but perhaps try with different limit sizes as a short term way to get at least some results.

So g.V().limit(50000).sample(1)

AWS-KRL

답변함 2달 전

전문가

Oleksii Bebych

검토됨 2달 전

MarkT
2달 전
Would there be a way to ensure I can tend towards capturing all nodes? If I wanted the random walk to occur across all nodes, would the limit impact the directions the walk could take?
MarkT
2달 전
For reference as well - this is a single instance DB, running now on r5.2xlarge
AWS-KRL
2달 전
If you had something like g.V().limit(100).sample(1).out() then yes it would mean the random walk could only ever start from one of the 100 found using the limit step.
AWS-KRL
2달 전
You could also potentially look at something like the coin step with a very small value, say perhaps coin(0.0001).sample(1)
MarkT
2달 전
Gotcha - good to know, thank you! Might just have to run neptune exporter and not sample

AWS Neptune Vector Sampling OOM

관련 콘텐츠