Skip to content

How to Ensure Persistent IDs for Quick Lookups in AWS Neptune?

0

We are working with AWS Neptune (Property Graph with Gremlin) and have a requirement to efficiently retrieve vertices using our own generated UUIDs. However, we encountered a few challenges:

Neptune Internal IDs Are Not Reliable

From what we've read in the AWS Neptune documentation and discussions, internal vertex IDs in Neptune are not guaranteed to be stable. They can change due to various factors such as backup restores, maintenance, and system optimizations. This makes them unsuitable for consistent lookups.

No Indexing on Property Level

Neptune does not support property-level indexes, which means that performing a query like

g.V().has('id', 'our-custom-uuid')

would result in a full scan of all vertices. This is a major bottleneck for us since we have more than 100M vertices with the same label, making label-based filtering impractical.

Our Proposed Approach:

One alternative we are considering is setting our custom UUID as the explicit vertex ID when creating nodes instead of using it as a property. Here’s an example:

g.addV('User').property(id, 'our-custom-uuid').property('name', 'John Doe') Then, we can retrieve it efficiently using:

g.V('our-custom-uuid')

This should ideally provide a constant-time lookup instead of a full scan.

Our Concern:

We couldn’t find any official documentation confirming whether explicitly set vertex IDs in Neptune remain persistent under all circumstances (e.g., backup restores, internal maintenance, cluster scaling, or system upgrades).

Can someone confirm whether manually assigned vertex IDs are truly persistent? Or is there a risk that they could change over time, similar to internal IDs?

asked 9 months ago96 views
2 Answers
1

Your proposed approach of using custom UUIDs as explicit vertex IDs in Neptune is indeed a good strategy for ensuring efficient lookups and persistent identification of vertices. This method is not only supported but also recommended by Amazon Neptune for several reasons:

  1. Persistence: Custom IDs assigned to nodes and relationships in Neptune are persistent and will not change due to system operations like backup restores, maintenance, or system upgrades. This makes them reliable for consistent lookups over time.

  2. Performance: Using custom IDs for nodes and relationships is optimized from a query execution perspective. Lookups by ID are much faster than using properties, as the query engine can use an optimized join during execution.

  3. Uniqueness: Neptune enforces uniqueness on IDs, unlike properties. This helps avoid the creation of duplicate data and reduces the chances of concurrent modification exceptions during concurrent mutation queries.

  4. Deterministic lookups: Custom IDs can be used as a deterministic lookup or filtering mechanism, which is more efficient than using properties.

Your example of creating a vertex with a custom ID:

g.addV('User').property(id, 'our-custom-uuid').property('name', 'John Doe')

And then retrieving it using:

g.V('our-custom-uuid')

is the correct approach for ensuring efficient, constant-time lookups.

It's worth noting that if you're migrating data from another system (like Neo4j) to Neptune, you can preserve existing IDs during the migration process. If you identify a suitable vertex property that contains unique values, you can promote it to become the vertex ID during the migration.

Additionally, for edges, Neptune offers a feature called inline Server-Generated Edge IDs, which can be enabled via a configuration parameter. This feature improves storage efficiency for edges when you don't need to provide your own IDs.

In summary, your approach of using custom UUIDs as explicit vertex IDs is a reliable and efficient method for ensuring persistent identification and quick lookups in Neptune. These IDs will remain stable across various system operations and provide the performance benefits you're seeking for your large-scale graph database.
Sources
Prefer using custom IDs for node/relationship - Amazon Neptune
Data migration from Neo4j to Neptune - Amazon Neptune
Neptune inlined server-generated edge ID - Amazon Neptune

answered 9 months ago
EXPERT
reviewed 9 months ago
0

If you are using Neptune Database, whether the IDs are user provided, or system generated, they are guaranteed to be both immutable, and unique. The only time you might run into collisions is if you exported the data from one database cluster, and tried to load it into another database cluster where those IDs are already in use. But within a single cluster, vertex IDs are unique. In fact it is often recommended to use them as a good way to quickly find specific items in the database.

AWS
MODERATOR
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.