How to perform intelligent web scraping using Firecrawl, Lambda function and Bedrock Agents?

0

Basically, we are extracting data from websites using firecrawl, both directly and via SERP (keywords site:website.com) and certain keywords. But, the problem is that, we are not able to filter out the pages which are most useful for our usecase. Our usecase is to answer certain questions, (RAG based) from this data. Can Agents help us to filter out the links in some way?

Enter image description here

Also, we also need to re-run RAG (retrieve_and_generate) functions on only a few prompts. Can Agents help in that as well?

Finally, can Agents help us refine our answer is correct or not? (RAG)

A good guide on all this will be very helpful!

2 Answers
0

It seems that Bedrock agents would not be helpful for filtering out web pages based on their usefulness in answering specific questions. Bedrock agents are more focused on automating workflows rather than evaluating content.

Instead, you could consider using Amazon Kendra. Kendra allows you to index web page content and then search it using natural language queries. This would allow you to find the most relevant pages for answering your questions.

profile pictureAWS
EXPERT
answered 17 days ago
  • What we need is filtering before scraping. I guess Kendra would scrape the entire content first and then query it? That would kinda defeat the purpose as it would be pretty token intensive.

0

I suggest checking WebCrawling in Amazon Bedrock Knowledge Bases. It has include and exclude pattern Enter image description here

Using L:ambda is not suggested. It has jard limit of 15 minutes of execution.

AWS
Mi_Sha
answered 12 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions