How can I manage large datasheets in PostgreSQL with pgvector using RAG approach with AWS Bedrock?

0

Hi Guys!

I was doing some tests and I loaded a very long excel file into my pgvector postgresql database. Then I did a query to get the embeddings by the user id, and got the response, but when calling bedrock I got an exception. ValueError: Error raised by bedrock service: An error occurred (ValidationException) when calling the InvokeModel operation: Input is too long for requested model.

I know that the problem is that the response I got from the DB is too long for bedrock to manage, but there is a way to manage large datasheets with RAG approach and pgvector?

Thank you!

1개 답변
1
수락된 답변

Hi,

Maybe you want to summarize your long datasheets with max N words to fit within pgvector constraints.

For that, you can use one of the LLMs of Bedrock to summarize your text to a shorter version before going to pg. A detailled post about the various summarization techniques is here: https://aws.amazon.com/blogs/machine-learning/techniques-for-automatic-summarization-of-documents-using-language-models/

That's what I personally do when I have texts too long for a given embedding engine. Of course, it is then better to store both the long version and the summary in your pg db.

Chunking text wouldn't work in your case: you would get the embeddings of each chunk. They can be very different in their coordinates from the embeddings of the full text.

Best,

Didier

profile pictureAWS
전문가
답변함 4달 전
profile picture
전문가
검토됨 2달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠