Help Please Upscale downtime

0

Hi, thank you all for your help in advance.

Sometimes, when an increase in either traffic or api requests happens on a client's ecommerce site, AWS does an automatic upscale. This causes temporary downtime on the live site and unresponsiveness for a few minutes. This is obviously an issue for us.

We get these error messages when the DB is scaling up and the API server is not able to connect to the DB:

1571Z 20131 [Note] Aborted connection 20131 to db: 'proddb' user: 'PrOdUsr' host: 'xx.x.xx.xxx' (Got an error reading communication packets)

Can anyone please help and tell me why this is happening? Below is the recent error logs and our current stack. Thank you so much for your help.

Before Issue stack as detailed below:

React Frontend Server - 2 x c5.large API Server - 2 x c5.large Backend and Bank API Server - 1 x c5.xlarge RDS - 1 x Serverless RDS 4acu - 16GB RAM

Upgraded RDS stack to:

Upgraded RDS after incident to: RDS - 1 x Serverless RDS 16acu - 32GB RAM

Thank you for your help

1 Answer
0

Hello there,

I understand that you are getting an Aborted Connection warning when the your database is scaling up and the API server is not connecting to the database. The error you are getting could be due to the log_error_verbosity exceeding a value of 2 and this is triggered whenever the status counter for the aborted_clients or aborted_connects metric values are incremented. AWS RDS uses the database error logs to retrieve this information.

The first step would be to check the Server Status Variables (Aborted_clients and Aborted_connects) and you can view these variables  by using the following command:

SHOW GLOBAL STATUS;

Aborted_clients is the number of connections that were aborted because the client died without closing the connection properly.

Aborted_connects is the number of failed attempts to connect to the MySQL server.

You can see reference [3] for more information with regards to connection errors.

Since the API is not able to connect to the Database, this could be due to the Aborted_connects status variable being incremented  because of unsuccessful connection attempts.

The reasons for the unsuccessful connection could be either:

  • A client attempts to access a database but has no privileges for it.
  • A client uses an incorrect password.
  • A connection packet does not contain the right information.
  • It takes more than connect_timeout seconds to obtain a connect packet.

Other factors that can trigger the Aborted connection warning include:

  • Client or driver incompatibility
  • Firewalls or proxies, which can close any idle connections or block a connection.
  • Improper closing of a client-server connection, resulting in a higher number of sleeping connections inside Amazon RDS MySQL.
  • Idle connections that exceed the wait_timeout or interactive_timeout thresholds.
  • A client application that improperly terminates a connection.
  • The max_allowed_packet parameter value is exceeded. If queries require more memory than what the instance has allocated for Amazon RDS MySQL, then the Aborted connection warning will get triggered.

A general approach to solving this problem would be to make sure that you review your MySQL parameter values, after identifying the root cause then you can update the parameter and test the new values while monitoring the MySQL error logs, see reference [1].

Also consider the following approaches when troubleshooting and resolving the problem:

  • Check to see if you're using the default values of an Amazon RDS parameter groups. The default values for parameters related to connectivity timeouts might not be appropriate for your DB instance. See reference [2] for more information on configuring parameters.
  • Set a higher value for connect_timeout to see if this helps to reduce the occurrence of the Aborted connection error messages. This parameter specifies how long the MySQL server instance must wait (in seconds) before responding with a bad handshake.
  • Modify interactive_timeout and wait_timeout.
  • Increase the value of max_allowed_packet if the instance must handle big queries. If a row has more data than the max_allowed_packet value for the client, then errors are reported. Increase this value if you are using large BLOB columns or long strings. See reference [2].
  • Make sure that the Amazon RDS for MySQL connections are properly closed. Before exiting the database, make sure to call the mysql_close() function from the client application.
  • You can also consider executing the tcpdump command from the machine that's running the client to test sample packet captures as part of your troubleshooting.

References:

[1] https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Monitor_Logs_Events.html

[2] https://aws.amazon.com/blogs/database/best-practices-for-configuring-parameters-for-amazon-rds-for-mysql-part-3-parameters-related-to-security-operational-manageability-and-connectivity-timeout/

[3] https://dev.mysql.com/doc/refman/8.0/en/communication-errors.html

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions