- Newest
- Most votes
- Most comments
Your proposed approach of using custom UUIDs as explicit vertex IDs in Neptune is indeed a good strategy for ensuring efficient lookups and persistent identification of vertices. This method is not only supported but also recommended by Amazon Neptune for several reasons:
-
Persistence: Custom IDs assigned to nodes and relationships in Neptune are persistent and will not change due to system operations like backup restores, maintenance, or system upgrades. This makes them reliable for consistent lookups over time.
-
Performance: Using custom IDs for nodes and relationships is optimized from a query execution perspective. Lookups by ID are much faster than using properties, as the query engine can use an optimized join during execution.
-
Uniqueness: Neptune enforces uniqueness on IDs, unlike properties. This helps avoid the creation of duplicate data and reduces the chances of concurrent modification exceptions during concurrent mutation queries.
-
Deterministic lookups: Custom IDs can be used as a deterministic lookup or filtering mechanism, which is more efficient than using properties.
Your example of creating a vertex with a custom ID:
g.addV('User').property(id, 'our-custom-uuid').property('name', 'John Doe')
And then retrieving it using:
g.V('our-custom-uuid')
is the correct approach for ensuring efficient, constant-time lookups.
It's worth noting that if you're migrating data from another system (like Neo4j) to Neptune, you can preserve existing IDs during the migration process. If you identify a suitable vertex property that contains unique values, you can promote it to become the vertex ID during the migration.
Additionally, for edges, Neptune offers a feature called inline Server-Generated Edge IDs, which can be enabled via a configuration parameter. This feature improves storage efficiency for edges when you don't need to provide your own IDs.
In summary, your approach of using custom UUIDs as explicit vertex IDs is a reliable and efficient method for ensuring persistent identification and quick lookups in Neptune. These IDs will remain stable across various system operations and provide the performance benefits you're seeking for your large-scale graph database.
Sources
Prefer using custom IDs for node/relationship - Amazon Neptune
Data migration from Neo4j to Neptune - Amazon Neptune
Neptune inlined server-generated edge ID - Amazon Neptune
If you are using Neptune Database, whether the IDs are user provided, or system generated, they are guaranteed to be both immutable, and unique. The only time you might run into collisions is if you exported the data from one database cluster, and tried to load it into another database cluster where those IDs are already in use. But within a single cluster, vertex IDs are unique. In fact it is often recommended to use them as a good way to quickly find specific items in the database.
Relevant content
- asked 7 months ago
- AWS OFFICIALUpdated 5 years ago
