RDS Mysql behind network load balancer

0

Hello,

we have configured MySQL RDS recently which is supposed to be placed behind network load balancer. Everything works fine for a while after registering RDS to TargetGroup. After some time there is no connection to RDS anymore and we got the following error message :

Host 'X.X.X.X' is blocked because of many connection errors' unblock with 'mysqladmin flush-hosts'. We have enabled audit logs and checked host_cache table :


                                        IP: X.X.X.X
                                      HOST: NULL
                            HOST_VALIDATED: YES
                        SUM_CONNECT_ERRORS: 100
                 COUNT_HOST_BLOCKED_ERRORS: 147
           COUNT_NAMEINFO_TRANSIENT_ERRORS: 0
           COUNT_NAMEINFO_PERMANENT_ERRORS: 1
                       COUNT_FORMAT_ERRORS: 0
           COUNT_ADDRINFO_TRANSIENT_ERRORS: 0
           COUNT_ADDRINFO_PERMANENT_ERRORS: 0
                       COUNT_FCRDNS_ERRORS: 0
                     COUNT_HOST_ACL_ERRORS: 0
               COUNT_NO_AUTH_PLUGIN_ERRORS: 0
                  COUNT_AUTH_PLUGIN_ERRORS: 0
                    COUNT_HANDSHAKE_ERRORS: 100
                   COUNT_PROXY_USER_ERRORS: 0
               COUNT_PROXY_USER_ACL_ERRORS: 0
               COUNT_AUTHENTICATION_ERRORS: 0
                          COUNT_SSL_ERRORS: 0
         COUNT_MAX_USER_CONNECTIONS_ERRORS: 0
COUNT_MAX_USER_CONNECTIONS_PER_HOUR_ERRORS: 0
             COUNT_DEFAULT_DATABASE_ERRORS: 0
                 COUNT_INIT_CONNECT_ERRORS: 0
                        COUNT_LOCAL_ERRORS: 0
                      COUNT_UNKNOWN_ERRORS: 0
                                FIRST_SEEN: 2023-06-23 12:28:04
                                 LAST_SEEN: 2023-06-23 13:24:40
                          FIRST_ERROR_SEEN: 2023-06-23 12:28:04
                           LAST_ERROR_SEEN: 2023-06-23 13:24:40

error from server_audit.log :

20230623 12:32:23,ip-X-X-X-X,,X.X.X.X,377,0,FAILED_CONNECT,,,1158,TCP/IP

Worth mentioning that we weren't trying to connect to RDS through load balancer - SUM_CONNECT_ERRORS and COUNT_HANDSHAKE_ERRORS are increasing just after registering RDS to target group. Looks like health checks are causing that. Could anyone indicate how to handle an such issue?

4 Answers
3

Yes, COUNT_HANDSHAKE_ERRORS increases due to NLB health checks as you recognize.
Changing the value of max_connect_errors in the RDS parameter group to a larger value allows more time before the connection becomes unavailable.
However, the best practice is to set the number of connections to slightly more than the maximum number expected to be open for each database instance.
So it may not be a good idea to set the value too large.
https://docs.aws.amazon.com/prescriptive-guidance/latest/hyperscale-aurora-mysql/manage-connections.html

The configuration variable max_connections limits the number of database connections for each MySQL instance. The best practice is to set it slightly higher than the maximum number of connections you expect to open on each database instance.

Alternatively, we could create a Lambda function or shell script that periodically executes the following MySQL commands, which would lead to a permanent solution.

FLUSH HOSTS;

Can you tell me why you are setting up NLB to target RDS?
We may be able to suggest a better configuration than what you are currently trying to do.

profile picture
EXPERT
answered a year ago
2

I have a same requirement and same problem. After some searching on internet, I have found this solution and it worked. You can change the "skip_name_resolve" flag value to 1 in DB Parameter Group and restart the DB. It will resolve the problem.

As per AWS documentation: "skip_name_resolve This parameter is set from the value of the --skip-name-resolve option. If it is OFF (0 in case of RDS), mysqld resolves host names when checking client connections. If it is ON (1 in case of RDS), mysqld uses only IP numbers. In this case, all Host column values in the grant tables must be IP addresses or localhost. Thus, this parameter is used to avoid DNS lookup on connection. By default, it’s turned off (set to 0).

If this parameter is turned off, you might find the following warning in the error log:

2017-03-03 13:10:00 [-]2017-03-03 13:07:22 3396 [Warning] IP address ‘<ip-address-rep-instance>’ could not be resolved: Temporary failure in name resolution "

answered a year ago
0

Is there any way to solve this permanently?

Andrzej
answered a year ago
  • Creating a Lambda or shell script that periodically executes the following commands would be a permanent solution.

    FLUSH HOSTS;
    
0
profile picture
EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions