RDS Proxy causes intermittent "Connection timed out" errors

1

Good day.

I have a problem after adding RDS Proxy to our service, "SQLSTATE[HY000] [2002] Connection timed out" started to appear. Though they are pretty infrequent (maybe 1/100,000 of connections or even less fail), the problem quickly became pretty annoying. Sometimes there is one or two per hour, sometimes none. Looks like the amount of errors correlates a little with the amount of connections made (it is hard to tell though).

Some info about the service:

  • Aurora MySQL 3.03.1, 3AZ, 1 writer + 2 readers.
  • PHP 8.1 + PDO extension (maybe that is a known issue that the advanced driver for Java solves?), connection attempt is stopped after about 30 seconds.

MySQL non-default parameters:

  • sql_mode = TRADITIONAL
  • max_connections = 500 (was useful before the proxy was introduced)
  • long_query_time = 30
  • binlog_format = MIXED (for blue-green deployments only)

RDS Metrics:

  • AbortedClients: there are some, tens of such failures per day, but significantly more than 2-5 errors I see.
  • ActiveTransactions: 0
  • AuroraSlowConnectionHandleCount: 1.5k+ is added per day
  • AuroraSlowHandshakeCount: 2-3 is added per day

RDS Proxy metrics are all green (no errors, connection pins, etc.). I've tried to enable advanced logging, there was not much to gain. Could see only that some connections were pinned due to the query size, but it was relatively rare, some per minute.

Thanks for any help.

If there are any folks that stumbled upon the same issue, feel free to complain here.

Adam
asked 10 months ago955 views
2 Answers
0

We started seeing the exact same behaviour this morning. No infrastructure changes. We swap from the RDS Proxy endpoint to the RDS Cluster endpoint and the connection timeout errors go away.

Tried creating a new RDS Proxy. Same issue, intermittent connection timeouts.

Bit of a problem for us since we're using Lambda so we need the proxy.

JasonF
answered 7 months ago
  • We also have started seeing the same problem this morning also.

    Our production workload is impacted and the resolution was to not use proxy - however, as we run high data loads using Lambda overnight, this mitigation is very temporary.

    I have been consulting with the RDS specialists from AWS business support for around 7 hours today and we have not got to the bottom of the problem yet.

  • Sup, if anyone is interested I haven't seen those errors since then (no code or infrastructure changes, they just disappeared). But I am not using Lambdas though. If you were able to fix the issue then please respond here, I am just interested to hear what it was.

  • @jason @adam out of curiosity what region are you seeing these issues in? I'm seeing them in ap-southeast-2

  • @luke yes we're also ap-southeast-2, glad that it's not just us then! I haven't tried re-introducing the proxy this morning but please let us know if you get to the bottom of the issue or have any updates.

    We've had to switch to Aurora Serverless v2 in the meantime while we can't use a proxy to avoid reaching the connection limit of our regular instances.

  • I am us-west-2 and everything is fine, looks like you are onto something.

0

Those errors stopped appearing about two weeks ago. Looks like AWS engineers are looking for us after all.

Adam
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions