Skip to content

Crawl Depth for the Bedrock Knowledge Base Web Crawler data source

0

May I confirm that it is not possible to specify a crawl depth for the Bedrock Knowledge Base Web Crawler data source?

I was not able to identify such an option in the following documentation and blog post:

1 Answer
0

Based on the available information, you are correct that there is no direct option to specify a crawl depth for the Amazon Bedrock Knowledge Base Web Crawler data source. The Web Crawler configuration does not include a specific setting for crawl depth.

Instead, the crawler's behavior is primarily controlled through other settings:

  1. Scope: You can set the scope to either "HOST_ONLY" (crawl only web pages belonging to the same host) or "SUBDOMAINS" (include subdomains as well).

  2. Inclusion and Exclusion Filters: You can use regular expression patterns to include or exclude certain URLs, which can indirectly control the depth of crawling.

  3. Crawler Limits: You can set limits on the rate of crawling and potentially the total number of pages crawled, which may indirectly affect the depth.

  4. Seed URLs: The starting points for crawling are determined by the seed URLs you provide.

The crawler will traverse child links under the same top primary domain and path, starting from the seed URL. While you can't directly specify a numerical crawl depth, you can use these configuration options to control the breadth and scope of the crawl.

If you need more granular control over crawl depth, you might need to consider alternative solutions or provide feedback to AWS about this feature request.
Sources
Web crawler data source - Amazon SageMaker Unified Studio
Implement web crawling in Amazon Bedrock Knowledge Bases | AWS Machine Learning Blog
Crawl web pages for your knowledge base - Amazon Bedrock

answered 10 months ago
EXPERT
reviewed 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.