How can I manage large datasheets in PostgreSQL with pgvector using RAG approach with AWS Bedrock?

0

Hi Guys!

I was doing some tests and I loaded a very long excel file into my pgvector postgresql database. Then I did a query to get the embeddings by the user id, and got the response, but when calling bedrock I got an exception. ValueError: Error raised by bedrock service: An error occurred (ValidationException) when calling the InvokeModel operation: Input is too long for requested model.

I know that the problem is that the response I got from the DB is too long for bedrock to manage, but there is a way to manage large datasheets with RAG approach and pgvector?

Thank you!

1回答
1
承認された回答

Hi,

Maybe you want to summarize your long datasheets with max N words to fit within pgvector constraints.

For that, you can use one of the LLMs of Bedrock to summarize your text to a shorter version before going to pg. A detailled post about the various summarization techniques is here: https://aws.amazon.com/blogs/machine-learning/techniques-for-automatic-summarization-of-documents-using-language-models/

That's what I personally do when I have texts too long for a given embedding engine. Of course, it is then better to store both the long version and the summary in your pg db.

Chunking text wouldn't work in your case: you would get the embeddings of each chunk. They can be very different in their coordinates from the embeddings of the full text.

Best,

Didier

profile pictureAWS
エキスパート
回答済み 4ヶ月前
profile picture
エキスパート
レビュー済み 2ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ