Amazonbot web crawler DOS incident

0

Has anybody had experience with the Amazonbot web crawler? (especially with it looking like a DOS incident)

We have a client where their service (hosted on AWS) kept going down until we identified an unusual amount of traffic from an Amazonbot with the User-Agent of "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)", after blocking it at an Application Load Balancer level via User-Agent Header rules (since modifying the "robots.txt" did not work / wasn't read) the response time of the clients service dropped significantly and has remained up.

If anybody knows what generally causes Amazonbot to crawl sites, or how I can identify the cause for this specific client's site, that would be extremely useful, cheers.

2 Answers
0

AmazonBot is not a service/feature of AWS.

For any questions or concerns regarding AmazonBot, I recommend reaching out to the AmazonBot team via the Contact link at the bottom of this page: https://developer.amazon.com/support/amazonbot

profile pictureAWS
EXPERT
Chris_G
answered 2 years ago
  • Hi Chris, I followed the link from the user agent with no avail currently, I did send them an email yesterday with more details of the situation as it was happening, although they have still not replied. I guess the best thing to do now is to wait for a response and be glad that the main issue has been fixed.

0

I know I'm a little late but right now I'm having exactly the same problem, I checked my server log and Amazonbot is hammering my site with the same User-agent, what I found out about this issue, is one of my server sites had a bug from the CMS where everytime a visitor enter then all the links from the site will add a random string at the end for example:

mysitedomain.com/contact?string_here=SIGK91682 mysitedomain.com/company?string_here=SIGK91682 mysitedomain.com/file.pdf?string_here=SIGK91682 mysitedomain.com/file2.pdf?string_here=SIGK91682

Where "SIGK91682" will be random string and everytime it opens another link all the links inside the site will change that random strings to another one for example:

mysitedomain.com/contact?string_here=y9b3C4834 mysitedomain.com/company?string_here=y9b3C4834 mysitedomain.com/file.pdf?string_here=y9b3C4834 mysitedomain.com/file2.pdf?string_here=y9b3C4834

When Amazonbot is crawiling the site it will never end because all the links will be different every single time so Amazonbot is not detecting this problem and crawling all the pages non-stop, from my logs I can see the bot trying to scan the same url with 10000 variations now try that with all the links and files and that caused a lot of problems in the server, funny that our server is in AWS so I guess it was a win-win situation for Amazon when we had to paid them for all that data bandwidth their bot used!!

I was able to "fix" this using some modsec rules to ban the Amazonbot from using our server data bandwidth and it seems to work ok, btw Googlebot was able to no fell in the same trap I wonder why.

What I can see from our modsec log is that Amazonbot still is trying to crawl all the random links but the rules prevent it, I already trying contacting the Amazonbot team sending an email but still not an answer.

answered 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions