Skip to content

passing all data from S3 Bucket into code?

0

Hi All,

I've been using the following GitHub repository to create my own generative AI chatbot https://github.com/aws-samples/rag-using-langchain-amazon-bedrock-and-opensearch.git

In line 49 of "load-data-to-opensearch.py", the dataset is downloaded via https URL: dataset_url = "https://huggingface.co/datasets/sentence-transformers/embedding-training-data/resolve/main/gooaq_pairs.jsonl.gz"

My dataset is stored in an AWS S3 bucket. I have substituted the above dataset with my own dataset via the downloadable https URL provided for my data file in the S3 bucket. This code seems to be good if you want to be able to use it to ask the chatbot questions about one data file, but what if I had several data files in my S3 bucket that I wanted to query? Is there a way to obtain a downloadable https URL for the entire S3 bucket? What if I ask the chatbot a question and I want it to search for the answer within all of the data files within one S3 bucket?

Thanks very much! Em

  • note: I don't want to use Bedrock Knowledge Bases. I want to do this all via code so I can create and customise a chatbot user interface and host the chatbot on my website.
1 Answer
1

Hi,

You can use Bedrock KBs and still have full control of your Chatbot on your website: just use the KB API RetrieveAndGenerate or Retrieve from Python boto3:

See https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_RetrieveAndGenerate.html or https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve_and_generate.html

KB will do all the heavyweight lifting re. the RAG part for you.

To see how it works, have a look at my article: https://repost.aws/articles/AR-LV1HoR_S0m-qy89wXwHmw/the-leverage-of-llm-system-prompt-by-knowledge-bases-for-bedrock-in-rag-workflows

Best,

Didier

EXPERT
answered 2 years ago
  • Thank you!

  • Is there a way to pass through all S3 Bucket files through the code? at the minute, it will only let me query one at a time (which isn't that useful for building a chatbot that will go on my website)

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.