Website is not responding when Remaining CPU burst capacity drop to ~2%

0

Hello all, I have a lightsail instance that runs a mediawiki website, recently from 8-14 I noticed the website became very slow randomly, checked metrics and found Remaining CPU burst capacity drop dramatically to ~2% from 8-14, see attached:

Enter image description here

I then checked the access_log and found following 3 IPs are the most frequent visitors, and they all seem to be amazon bots that ends with crawl.amazonbot.amazon:

3.224.220.101 52.70.240.171 23.22.35.162

My first question is - what are amazonbot doing? My website has nothing to do with e-business, can I block these IP?

ywsz
asked a month ago215 views
2 Answers
0

Hello.

With Lightsail alone, it is difficult to block specific IP addresses.
It is possible to block a specific IP address in the web server settings, but since the processing is done on the web server side, if a large number of accesses are occurring, the CPU usage rate may end up increasing. There is a gender.
So, why not try implementing AWS WAF by changing the configuration to the one introduced in the AWS blog below?
https://aws.amazon.com/jp/blogs/compute/integrating-aws-waf-with-your-amazon-lightsail-instance/

profile picture
EXPERT
answered a month ago
  • I guess I first want to understand why there is such a sudden drop?

    CPU burst capacity starts to be consumed when the CPU usage rate exceeds the baseline throughput (green line in the image). https://docs.aws.amazon.com/lightsail/latest/userguide/baseline-cpu-performance.html
    In your case, the CPU usage rate started to rise from August 14th, and from then on, the CPU burst capacity was gradually consumed. It seems that consumption took place all at once on August 15th.

    are the slowness issues caused by this sudden drop of Remaining CPU burst capacity?

    It is possible to temporarily achieve high performance by consuming CPU burst capacity. However, I think the website went down because once the CPU burst capacity was used up, the performance dropped to the baseline throughput. Therefore, even though there is currently almost no CPU burst capacity, we expect that the website operation is becoming unstable because the CPU usage rate is higher than the baseline throughput.

  • Could you help me to understand why this sudden drop starts to happen recently? what are those IPs trying to do?

  • I think "crawl.amazonbot.amazon" is an information gathering bot for Alexa. It depends on how often these IP addresses are accessed, but if there is a large amount of access, it may lead to an increase in Lightsail's load. The details of the document state that robots.txt will be respected, so I think that if you set it in your application's robots.txt, access will be reduced. https://developer.amazon.com/amazonbot

    Robots.txt: Amazonbot respects the robots.txt directives user-agent and disallow. In the example below, Amazonbot won't crawl documents that are under /do-not-crawl/ or /not-allowed. Today, AmazonBot does not support the crawl-delay directive in robots.txt and robots meta tags on HTML pages such as “nofollow” and "noindex". If a ‘robots.txt’ file is changed, AmazonBot responds to any changes within 24 hours.

  • Thank you for the info Riku! I tried to block some of the IPs using iptables which seems easiest to me, seems working fine so far, is there any downside doing that?

  • It is possible to block IP addresses using iptables, but please note that depending on how you configure it, CPU usage may increase. https://serverfault.com/questions/919955/iptables-consumes-a-lot-of-cpu-when-blocking-many-ips
    When using AWS WAF, there are managed rules that block access from bots in addition to IP address restrictions, so if you think that access from bots will increase in the future, you should check it out. https://docs.aws.amazon.com/waf/latest/developerguide/waf-bot-control.html

0

Thanks, but I guess I first want to understand why there is such a sudden drop? are the slowness issues caused by this sudden drop of Remaining CPU burst capacity?

ywsz
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions