Skip to content

Knowledge Base capabilities?

2

Hi all,

I am using the code** from the following article https://aws.amazon.com/blogs/machine-learning/knowledge-bases-in-amazon-bedrock-now-simplifies-asking-questions-on-a-single-document/ to create a generative AI chatbot for my website.

The code allows you to query a single data file in your S3 Bucket. I have 1000s of data files in my S3 bucket that I want the chatbot to retrieve and search for the answer to the question in. How do I modify the code to be able to process all files in an S3 bucket? I dont want to write out the name of every single file in my code - my S3 bucket has 1000s of files.

**here is the code from the article: import boto3

bedrock_client = boto3.client(service_name='bedrock-agent-runtime') model_id = "your_model_id_here" # Replace with your modelID document_uri = "your_s3_uri_here" # Replace with your S3 URI

def retrieveAndGenerate(input_text, sourceType, model_id, document_s3_uri=None, data=None): region = 'us-west-2'
model_arn = f'arn:aws:bedrock:{region}::foundation-model/{model_id}'

if sourceType == "S3":
    return bedrock_client.retrieve_and_generate(
        input={'text': input_text},
        retrieveAndGenerateConfiguration={
            'type': 'EXTERNAL_SOURCES',
            'externalSourcesConfiguration': {
                'modelArn': model_arn,
                'sources': [
                    {
                        "sourceType": sourceType,
                        "s3Location": {
                            "uri": document_s3_uri  
                        }
                    }
                ]
            }
        }
    )
    
else:
    return bedrock_client.retrieve_and_generate(
        input={'text': input_text},
        retrieveAndGenerateConfiguration={
            'type': 'EXTERNAL_SOURCES',
            'externalSourcesConfiguration': {
                'modelArn': model_arn,
                'sources': [
                    {
                        "sourceType": sourceType,
                        "byteContent": {
                            "identifier": "testFile.txt",
                            "contentType": "text/plain",
                            "data": data  
                        }
                    }
                ]
            }
        }
    )

response = retrieveAndGenerate( input_text="What is the main topic of this document?", sourceType="S3", model_id=model_id, document_s3_uri=document_uri )

print(response['output']['text'])

1 Answer
2
Accepted Answer

Hello.

The code you are using uses "EXTERNAL_SOURCES" to search S3 directly.
I think it would be a good idea to set "KNOWLEDGE_BASE" and use something like OpenSearch Serverless to search for S3 documents.
You can also create OpenSearch Serverless when setting up a knowledge base by following the steps in the document below.
Once you have configured the knowledge base, execute the "retrieve_and_generate()" API for the knowledge base and the objects in S3 will be searched.
However, OpenSearch Serverless is expensive, so be careful when using it.
https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-create.html

I think the following URL will be helpful for sample code when querying the knowledge base.
https://github.com/aws-samples/amazon-bedrock-samples/blob/main/rag-solutions/contextual-chatbot-using-knowledgebase/lambda/bedrock-kb-retrieveAndGenerate.py

EXPERT
answered a year ago
EXPERT
reviewed a year ago
EXPERT
reviewed a year ago
  • Thank you! I will try this solution when I get home later. Will report back soon. Thank you for your support

  • Thank you! I will try this solution when I get home later. Will report back soon. Thank you for your support

  • Hi Riku, thanks very much for your support. Do you know if it's possible to do the code without Opensearch Serverless (OSS)? Can I just use knowledge bases in the code to get the chatbot to search all of my documents in an S3 Bucket for the answer to a query without OSS?

  • Unfortunately, specifying multiple files in "s3Location" will result in an error, so the code you are using cannot search for multiple files. You can also choose the Pinecone solution instead of Opensearch Serverless for your vector database. https://www.pinecone.io/blog/amazon-bedrock-integration/

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.