Recommended ways to whitelist "good" bots?

0

We've been using AWS WAF for several weeks now, and we're running into problems were our web ACL appears to be blocking potentially good (or otherwise harmless) traffic, such as that from search engines. Are there any recommended ways to whitelist "good" bots (like googlebot, bingbot, etc.)? I could just whitelist them by user-agent, but that's easily spoofed. And whitelisting by IP address ranges is tedious and subject to change. What would be ideal would be to craft some sort of Lambda script that would read in the IP ranges of known good bots from regularly updated sources.

asked 7 years ago1800 views
2 Answers
1

We recently found WAF is blocking googlebot with its AWSManagedIPReputationList

Using the IP list found here:

https://developers.google.com/search/docs/advanced/crawling/verifying-googlebot https://developers.google.com/search/apis/ipranges/googlebot.json

We implemented an ALLOW rule with these IP ranges. Not sure why AWSManagedIPReputationList added the googlebot IPs without doing proper investigation, will need to review the continued use of that if it keeps blocking legitimate IPs.

Cheers,

Chris

answered 2 years ago
0

Hi,

I am not sure there is one recipe for every good bot, in case of google, they explain how to be sure it is googlebot and not a fake one:

1-Run a reverse DNS lookup on the accessing IP address from your logs, using the host command.

2-Verify that the domain name is in either googlebot.com or google.com

3-Run a forward DNS lookup on the domain name retrieved in step 1 using the host command on the retrieved domain name. Verify that it is the same as the original accessing IP address from your logs.

Example:

host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.

host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1

You will have to implement that with a lambda checking the logs I guess.

Regards

ghaber
answered 7 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions