- Newest
- Most votes
- Most comments
Hi,
Maybe you want to summarize your long datasheets with max N words to fit within pgvector constraints.
For that, you can use one of the LLMs of Bedrock to summarize your text to a shorter version before going to pg. A detailled post about the various summarization techniques is here: https://aws.amazon.com/blogs/machine-learning/techniques-for-automatic-summarization-of-documents-using-language-models/
That's what I personally do when I have texts too long for a given embedding engine. Of course, it is then better to store both the long version and the summary in your pg db.
Chunking text wouldn't work in your case: you would get the embeddings of each chunk. They can be very different in their coordinates from the embeddings of the full text.
Best,
Didier
Relevant content
- asked 10 months ago
- Accepted Answer
- Accepted Answerasked 3 months ago
- How do I end long-running queries in my Amazon RDS for PostrgreSQL or Aurora PostgreSQL DB instance?AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
And when I do the query against my postgresql DB to obtain the data and then call bedrock to get the final answer, should I do the query into the DB with the user id and also looking by the embeddings right? The embeddings will be the user done question in the chat converted to embeddings before doing the query to the pgvector. Am I right? Im new in this field. Thank you!
Hi Cao95, this post will give you full details on how to proceed: https://www.timescale.com/blog/postgresql-as-a-vector-database-create-store-and-query-openai-embeddings-with-pgvector/
This one may also be helpful for you to understand: https://tembo.io/blog/pgvector-and-embedding-solutions-with-postgres/
Happy to have helped you! Please, accept my answer if it was satisfactory for you.
Thank you man!
you are very welcome!