- Newest
- Most votes
- Most comments
The website might be implementing IP blocking or rate limiting measures, which prevent automated access to the site or limit the number of requests from a particular IP address. This could be a reason why you are receiving a 403 error when crawling with Kendra but can access the site using a custom Python library. Some websites enforce restrictions on the User-Agent header in HTTP requests. If Kendra's web crawler uses a specific User-Agent that is blocked or restricted by the website, it could result in a 403 error. In contrast, your custom Python library may use a different User-Agent that is allowed by the website.
So you can
Ensure that the IP address used by Kendra's web crawler is not being blocked or limited by the website. If necessary, consider contacting the website administrator to whitelist the IP address used by Kendra.If the website enforces User-Agent restrictions, you can try modifying the User-Agent header sent by Kendra's web crawler to mimic a different user agent. This may involve customizing the settings or configuration in Kendra to modify the User-Agent value.
Relevant content
- asked a year ago
- AWS OFFICIALUpdated 4 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 3 months ago
There is no option in Kendra to modify user-agent value and how to get IP address of the web crawler, it won't be publicly disclosed right. Are there any other option to let Kendra crawl the website?