Hi all,
I am using the code** from the following article https://aws.amazon.com/blogs/machine-learning/knowledge-bases-in-amazon-bedrock-now-simplifies-asking-questions-on-a-single-document/
to create a generative AI chatbot for my website.
The code allows you to query a single data file in your S3 Bucket. I have 1000s of data files in my S3 bucket that I want the chatbot to retrieve and search for the answer to the question in. How do I modify the code to be able to process all files in an S3 bucket? I dont want to write out the name of every single file in my code - my S3 bucket has 1000s of files.
**here is the code from the article:
import boto3
bedrock_client = boto3.client(service_name='bedrock-agent-runtime')
model_id = "your_model_id_here" # Replace with your modelID
document_uri = "your_s3_uri_here" # Replace with your S3 URI
def retrieveAndGenerate(input_text, sourceType, model_id, document_s3_uri=None, data=None):
region = 'us-west-2'
model_arn = f'arn:aws:bedrock:{region}::foundation-model/{model_id}'
if sourceType == "S3":
return bedrock_client.retrieve_and_generate(
input={'text': input_text},
retrieveAndGenerateConfiguration={
'type': 'EXTERNAL_SOURCES',
'externalSourcesConfiguration': {
'modelArn': model_arn,
'sources': [
{
"sourceType": sourceType,
"s3Location": {
"uri": document_s3_uri
}
}
]
}
}
)
else:
return bedrock_client.retrieve_and_generate(
input={'text': input_text},
retrieveAndGenerateConfiguration={
'type': 'EXTERNAL_SOURCES',
'externalSourcesConfiguration': {
'modelArn': model_arn,
'sources': [
{
"sourceType": sourceType,
"byteContent": {
"identifier": "testFile.txt",
"contentType": "text/plain",
"data": data
}
}
]
}
}
)
response = retrieveAndGenerate(
input_text="What is the main topic of this document?",
sourceType="S3",
model_id=model_id,
document_s3_uri=document_uri
)
print(response['output']['text'])
Thank you! I will try this solution when I get home later. Will report back soon. Thank you for your support
Thank you! I will try this solution when I get home later. Will report back soon. Thank you for your support
Hi Riku, thanks very much for your support. Do you know if it's possible to do the code without Opensearch Serverless (OSS)? Can I just use knowledge bases in the code to get the chatbot to search all of my documents in an S3 Bucket for the answer to a query without OSS?
Unfortunately, specifying multiple files in "s3Location" will result in an error, so the code you are using cannot search for multiple files. You can also choose the Pinecone solution instead of Opensearch Serverless for your vector database. https://www.pinecone.io/blog/amazon-bedrock-integration/