By using AWS re:Post, you agree to the Terms of Use

Expected performance / Limitations of AWS Neptune graph with millions of edges


I am trying to understand the performance and limitations of an AWS Neptune instance for very large graphs. Currently, we are testing a graph with 20+ million vertices and 25+ million edges. The graph has a couple of "super vertices" with 2 or 3 millions of edges. We currently have this graph in a db.r6g.2xlarge Neptune cluster with several read replicas, but this is all happening when doing an isolated query.

We are experiencing problems (mostly query timeouts) while doing simple queries like g.V('super_vertex_id').bothE().id() and any other query of a traversal through the super-vertices. However, monitoring the Neptune cluster metrics while doing these queries, shows that no more than 5% of the process is being used. This raises several questions regarding Neptune:

  • Is Neptune technology suitable for graphs of these dimensions?
  • What is the bottleneck in a query like g.V('super_vertex_id').bothE().id()
    • Is it IOPS related? can Neptune DB IOPS be monitored and modified?
  • Would increasing the instance class from db.r6g.2xlarge make a difference?
  • Are there any configuration parameters that could make Netpune have better performance?
  • Would using a query language other than Gremlin make a difference ?

EDIT: We are aware that the neptune_query_timeout value can be increased. However I want to understand the fundamental limitations and performance implications, trying to avoid having to wait hours for queries to run. Thanks a lot!

asked 2 months ago159 views
1 Answer

Size of the graph is mostly irrelevant. Performance really boils down to the expected query frontier for a given query (however many elements need to be fetched from the graph to compute the results of the query). This can be seen via using the Gremlin Profile API and the number of index operations that occur when running the query.

Neptune's architecture allows for separate scaling of compute and storage. The graph is not stored on any of the compute instances, it is persisted inside of a shared cluster volume that can be accessed from the writer instance and any of the read-replicas. Each instance does maintain a cached version of recently accessed elements of the graph, but the cache is ephemeral and does not persist across reboots. The instances themselves are primarily used for query computation. Each instance has a static number of query execution threads that is equal to 2x the number of vCPUs on the instance. Property Graph queries are executed in a single threaded model with some exceptions (the Gremlin Profile API can also denote when a query has concurrent execution). Mutation queries are always single threaded to ensure write consistency. With this architecture, Neptune is designed to support highly concurrent workloads with OLTP-style queries with constrained query frontier. At present, OLAP queries on Neptune that need to access a large portion (or all) of graphs with 100s of millions or billions of elements will run, but may require larger timeouts or require the user to rewrite their queries in a way that they can be issued to Neptune using multiple, concurrent threads.

Data inside of Neptune is automatically indexed so that most queries will run with optimal performance. There are a few cases where a query needs to be rewritten to perform optimally. Queries that use in(), inE(), both(), or bothE() will run unconstrained unless you also provide a edge label (or list of labels) within those steps (ex: bothE('label1','label2','labelN') ). The Gremlin Profile API output will also provide guidance on how to address these types of situations where possible.

Gremlin Profile API:

Other than Gremlin, it can be advantageous to compare query performance also using openCypher. Both languages can be used interchangably on the same Property Graph dataset in Neptune. openCypher support in Neptune was developed from the beginning using the newer DFE query engine (, which does provide for some performance optimizations. A great deal of Gremlin execution has support for DFE, but there are still portions of the query language that have yet to be implemented within DFE. DFE for openCypher is enabled by default. DFE for Gremlin requires additional configuration, as noted in the DFE documentation link above.

profile picture
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions