Amazon Bedrock Knowledge Base -Limit of 50 Knowledge Base

1

In my scenario, I have many documents organized into folders (one folder per customer). I have thousands of customers/folder. There's a maximum of 50 Knowledge bases per account per region (not adjustable), which prevents me design the solution with one knowledge base per process/folder.

If I model with just one knowledge base for all folders/customer, from the documentation there's no way to filter the embeddings database in the Retrieve API call based on the document base path in the S3 bucket.

I really like the simplicity of the RAG workflow with Knowledge Bases, but it seems that I can't use it in my scenario. Any thoughts?

3 Answers
1

Since yesterday, this scenario is supported. Now we can add metadata to the documents in a knowledge base.

https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html#kb-ds-metadata

And then, we can filter results by using metadata filter in the Retrieve and RetrieveAndGenerate API calls.

Awesome.

answered 4 months ago
0

In your scenario, where you have many documents organized into folders (one folder per customer) and thousands of customers/folders,the limitation of 50 Knowledge Bases per account per region poses a challenge.

One approach you could consider is to use a single Knowledge Base for all folders/customers, but leverage metadata or tags to differentiate between the documents belonging to different customers. You can include customer identifiers as metadata/tags associated with each document or folder in your S3 bucket (https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html) . Then, when querying the Knowledge Base using the Retrieve API, you can filter the results based on these metadata/tags to retrieve documents specific to a particular customer.

For example, when uploading documents to your S3 bucket, you can include customer identifiers as object metadata or as tags. Then, when querying the Knowledge Base, you can include filters based on these metadata/tags to retrieve only the documents relevant to a specific customer.

While this approach may require additional setup and management of metadata/tags, it allows you to leverage the simplicity of the RAG workflow with Knowledge Bases while still accommodating your scenario with multiple customers/folders.

Additionally, you may want to consider optimizing your folder structure or document organization in S3 to minimize the number of Knowledge Bases required. By organizing your documents in a way that reduces the need for separate Knowledge Bases for each customer, you can potentially work within the limitations imposed by the maximum number of Knowledge Bases per account per region.

profile picture
EXPERT
A_J
answered 4 months ago
  • Thank you. Looking at the API reference, the Retrieve or RetrieveAndGenerate methods don't seem to accept document metadata as a filter in the payload request.

    In the Retrieve I could filter out based on the S3 location, assuming that I am following a convention for the S3 keys such as s3://my-bucket/my-customer-id/

    The problem is the high number of results I will get for the query since customers have similar documents, hence, I would get a log of results with the nextToken indication many times.

    I guess this approach would be feasible only if the filter is applied directly in the vector database, which is a scenario that is not apparently supported by knowledge base.

0

Another approach is to not use knowledge base and implement it yourself and have total control and no quotas limits.

Behind the scenes, you just need an s3 bucket for documents storage, a lambda trigger to store embedding into a vector database, as rds or open search Serverless.

The leverage bedrock sdk to ask LLM with the context of the vector database.

Before even knowledge base was out, I did it myself in this series: https://aws.plainenglish.io/bedrock-unveiled-a-quick-lambda-example-bceaf38cb33b

profile picture
EXPERT
answered 4 months ago
profile picture
EXPERT
reviewed 4 months ago
  • Thank you. That's an option. I really like the no-code approach of Bedrock knowledge base. That's what triggered my attention when compared with the competition.

    If my scenario it's not supported, I can always implement myself as you suggested, using the normal code approach with LangChain. My hope is that Bedrock Knowledge Base will support soon these enterprise scenarios where it's needed to organize the Knowledge Base by customer/tenant/topic/folder/etc.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions