Size of the graph is mostly irrelevant. Performance really boils down to the expected query frontier for a given query (however many elements need to be fetched from the graph to compute the results of the query). This can be seen via using the Gremlin Profile API and the number of index operations that occur when running the query.
Neptune's architecture allows for separate scaling of compute and storage. The graph is not stored on any of the compute instances, it is persisted inside of a shared cluster volume that can be accessed from the writer instance and any of the read-replicas. Each instance does maintain a cached version of recently accessed elements of the graph, but the cache is ephemeral and does not persist across reboots. The instances themselves are primarily used for query computation. Each instance has a static number of query execution threads that is equal to 2x the number of vCPUs on the instance. Property Graph queries are executed in a single threaded model with some exceptions (the Gremlin Profile API can also denote when a query has concurrent execution). Mutation queries are always single threaded to ensure write consistency. With this architecture, Neptune is designed to support highly concurrent workloads with OLTP-style queries with constrained query frontier. At present, OLAP queries on Neptune that need to access a large portion (or all) of graphs with 100s of millions or billions of elements will run, but may require larger timeouts or require the user to rewrite their queries in a way that they can be issued to Neptune using multiple, concurrent threads.
Data inside of Neptune is automatically indexed so that most queries will run with optimal performance. There are a few cases where a query needs to be rewritten to perform optimally. Queries that use
bothE() will run unconstrained unless you also provide a edge label (or list of labels) within those steps (ex:
bothE('label1','label2','labelN') ). The Gremlin Profile API output will also provide guidance on how to address these types of situations where possible.
Gremlin Profile API: https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-profile-api.html
Other than Gremlin, it can be advantageous to compare query performance also using openCypher. Both languages can be used interchangably on the same Property Graph dataset in Neptune. openCypher support in Neptune was developed from the beginning using the newer DFE query engine (https://docs.aws.amazon.com/neptune/latest/userguide/neptune-dfe-engine.html), which does provide for some performance optimizations. A great deal of Gremlin execution has support for DFE, but there are still portions of the query language that have yet to be implemented within DFE. DFE for openCypher is enabled by default. DFE for Gremlin requires additional configuration, as noted in the DFE documentation link above.
Performance Insights Counter Metrics and Database Load graphs not visible.Accepted Answerasked 3 years ago
Connecting Neptune with OpenSearchasked 2 months ago
Neptune PageRank calculationasked 4 months ago
Expected performance / Limitations of AWS Neptune graph with millions of edgesasked 2 months ago
Neptune and Cypher - Poor Query Performanceasked 5 days ago
Multi-tenancy in Neptuneasked 8 months ago
Traversing through complete path (Graph DB, Gremlin, Neptune Cluster)asked 4 months ago
Gremlin Query to count Edges - MemoryLimitExceededException AWS Neptuneasked 2 years ago
Neptune PerformanceAccepted Answerasked 8 months ago
s3-dist-cp expected performanceasked 6 months ago