Cosine similarity/vector calculations using vectors stored on Neptune DB

0

I have a Neptune DB with tokens and their embeddings as a property. I had to use json.dumps(embedding) cause Neptune wouldn't accept float arrays (512x1 in my case). I'd like to know, once with a new embedding of a new token in hands that could come from a prompt for example, how to calculate its cosine similarity relative to the other tokens to position it within a graph, for example. Is Neptune itself capable of doing that or would I have to use Neptune Analytics for that?

asked 4 months ago237 views
3 Answers
1

It's not clear to me exactly what you are trying to accomplish here, if you have an incoming vector embedding and want to compare this against the other embeddings within the graph, you would load your graph data (https://docs.aws.amazon.com/neptune-analytics/latest/userguide/loading-data.html) and associated embeddings (https://docs.aws.amazon.com/neptune-analytics/latest/userguide/vector-index.html) into Neptune Analytics and use the topKbyEmbedding algorithm provided which will find the K nearest neighbors using a Hierarchical Navigable Small World algorithm implementation. (https://docs.aws.amazon.com/neptune-analytics/latest/userguide/vectors-topKByEmbedding.html)

In this scenario, both your node and embedding data will be stored in a Neptune Analytics graph which you will query using the graph endpoint with openCypher, there is not a link to a Neptune Database.

I think there might be a bit of confusion here on the difference between Amazon Neptune Database and Amazon Neptune Analytics. Each is a separate feature of Amazon Neptune and while you can use the two with each other, they each maintain an independent copy of the data.

Amazon Neptune Database is a serverless graph database designed for superior scalability and availability. Neptune Database provides built-in security, continuous backups, and integrations with other AWS services.

Amazon Neptune Analytics is an analytics database engine for quickly analyzing large volumes of graph data to get insights and find trends from data stored in Amazon S3 buckets or a Neptune database. Neptune Analytics uses built-in algorithms, vector search, and in-memory computing to run queries on data with tens of billions of relationships in seconds.

AWS
answered 4 months ago
0

You perfectly got my use case. I want to get a prompt and compare its embedding to the other embeddings already in the graph. I now see.. the quote I mentioned earlier from the docs "Note that the nodes in your graph must have at least one user property or label in order to associate them with embeddings" just reiterates that, due to Neptune DB and Neptune AN being two separate databases, there needs to be a kind of key/identifier that links both; then I can, for example, use Neptune AN's topKbyEmbedding to get the keys of K nearest neighbors, and use those keys to retrieve data from those tokens from Neptune DB. Is that it?

answered 4 months ago
  • Neptune Analytics can store both the node data and embedding data, so you should not need Neptune DB in this case. For example, the query below will find the top 10 closest neighbor nodes (including their properties) to an embedding passed in as well as the similarity score:

    CALL neptune.algo.vectors.topKByEmbedding( <YOUR EMBEDDING>)
    YIELD node, score
    RETURN node, score
    
0

Vector Similarity Search, along with a selection of other built in graph algorithms are all part of Neptune Analytics. To use any of those algorithms, the data will need to be loaded into a Neptune Analytics graph. There is some additional information here:

  1. https://docs.aws.amazon.com/neptune-analytics/latest/userguide/similarity-algorithms.html
  2. https://docs.aws.amazon.com/neptune-analytics/latest/userguide/vector-similarity.html
AWS
AWS-KRL
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions