- Newest
- Most votes
- Most comments
I have simillar problem. My graph is very small, only 6 000 nodes and 30 000 edges, but when I try just to find one shortest path between two nodes I end with MemoryLimitExceededException.
If I run this query, that shoud find shortes path between node lru99309 and lru306193: g.V('lru99309').repeat(outE().inV().simplePath()).until(hasId('lru306193')).path().limit(1).group().by(count(local)).next()
I got the response: ==>15=[path[v[lru99309], e[lru99309_lru168327][lru99309-dansecon->lru168327], v[lru168327], e[lru168327_lru140198][lru168327-dansecon->lru140198], v[lru140198], e[lru140198_lru170399][lru140198-dansecon->lru170399], v[lru170399], e[lru170399_lru69973][lru170399-dansecon->lru69973], v[lru69973], e[lru69973_lru316679][lru69973-dansecon->lru316679], v[lru316679], e[lru316679_lru316652][lru316679-dansecon->lru316652], v[lru316652], e[lru316652_lru306193][lru316652-dansecon->lru306193], v[lru306193]]]
But if I try only node that is next to lru306193, but a little far away from starting node I obtain MemoryLimitExceededException.
I do not understand if I am doing something wrong or if the Neptune is not possible to use for searching for shortest path.
Normally I use OSRM engine to search for shortest paths in the road network, but I was interested if I can use Neptune to solve basic road network questions.
My graph is: neptunegraphtraversalsource[neptunegraph[], standard]
Maybe I need some other type of graph, such as graphtraversalsource[tinkergraph[], graphcomputer], but I think that Neptune does not support graphcomputer version of the graph.
It is likely going to come down to how big of a fan-out you see at each hop in the graph. Neptune performs queries, by default, using a BFS pattern. So when doing shortest-path traversals, it comes down to traversing all vertices at each level in the graph to return the shortest path. When using a limit(x)
step, the query will stop once it has found the first x results. At present, Gremlin queries are natively run on a single execution thread. There are a few things that you may want to try....
-
Each Neptune instance has a finite number of execution threads equal to 2 x the number of vCPUs on the instance. About 2/3rds of the instance memory is reserved for buffer pool cache. The remaining memory is allocated (some what) evenly across the worker threads. Smaller instances (such as a t3/t4g.medium) will have very little memory per execution thread. Memory per thread will increase slightly until you get to the 4xlarge sized instances. The 4xlarge or larger instances will have the largest amount of thread memory allocated. So increasing the instance up to a 4xlarge can improve OOM related queries.
-
Neptune has two different query execution engines - a native engine and a newer engine called the DFE engine. Starting in version 1.1.1.0, DFE is enabled by default but it only used by default for openCypher or SPARQL queries. To use this with Gremlin queries, you'll need to leverage a query hint [1]. DFE can improve certain parts of a query, so it is worth a try.
[1] https://docs.aws.amazon.com/neptune/latest/userguide/neptune-dfe-engine.html
If you're trying to get a sense for how many objects a query is having to access to perform a query, you can use the Neptune Gremlin Profile API (not the TinkerPop profile()
step, this is different): https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-profile-api.html
Moving to a larger cluster instance really helps for these kind of issues.
Slightly modifying the query to exclude duplicates and moving to a larger cluster instance resolved memory limitation error.
Modified query: g.V('12345').as('sourceID').repeat(both().as('destID').simplePath().dedup('sourceID', 'destID')).emit().path()
This query tries to find the shortest path between source node and destination nodes and outputs path excluding duplicates.
Thanks.
Relevant content
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
This sounds like you're attempting to find the Connected Components for a single vertex? If so, you may want to try the OLTP approach noted here: https://tinkerpop.apache.org/docs/current/recipes/#connected-components. If the graph is large and highly-connected, the OLTP approach may not work (and Neptune does not support withComputer() today). In those cases, you may need to fall back on using the Connected Components algorithms built into Apache Spark GraphFrames (https://stackoverflow.com/questions/65480804/how-to-get-connected-component-with-graphframes-in-pyspark-and-raw-data-in-spark).