RDS Auth Proxy failures and connection timeouts

0

I have a number of users connecting to a database, some IaM enabled and others which use password authentication. Generally connection and authentication is fine for both types of users and our applications using the database are absolutely fine.

We have started getting intermittent auth issues (seemingly just IaM users) and connection timeouts.

For our IaM enabled users, We generate auth tokens using RdsSigner in a Lambda, which I have read relies upon the STS service to get the auth token we need, so we don't control this other than passing it along during connection.

Occasionally, We seem to be getting instances where rdsauthproxy indicates a failure and then other clients of the database experience connection terminations. I realise that the connections are effectively unrelated but it seems a consistent pattern of auth failures and then connection terminations.

I appreciate I am sending across parts of a log so this may be hard to follow, but I'll try anway,

We receive something like this first:

* We are completely uploaded and fine
2024-01-03 07:33:41 UTC:x.x.x.x(40004):[x]@[y]:[4657]:LOG:  could not accept SSL connection: EOF detected
2024-01-03 07:33:43 UTC::@:[396]:LOG:  checkpoint starting: time
< HTTP/1.1 503 Service Unavailable
< Content-Type: text/html;charset=utf-8
< HTTP/1.1 503 Service Unavailable
< Content-Type: text/html;charset=utf-8
< Content-Length: 0
< Content-Length: 0
< < 

* Connection #0 to host rdsauthproxy left intact
* Connection #0 to host rdsauthproxy left intact

And then a little later (for another request):

2024-01-03 07:34:01 UTC:172.31.83.57(47075):x@y[4606]:LOG:  pam_authenticate failed: Permission denied
2024-01-03 07:34:02 UTC:172.31.83.57(47075):x@y[4606]:FATAL:  canceling authentication due to timeout

We then also see:

2024-01-03 07:34:26 UTC:x.x.x.x(36200):[x]@[y]:[4682]:LOG:  could not accept SSL connection: EOF detected
2024-01-03 07:34:26 UTC:x.x.x.x(37276):[x]@[y]:[4673]:LOG:  could not receive data from client: Connection reset by peer

And finally a whole cluster of empty replies (I guess because these were all done at same instant)

2024-01-03 07:35:05 UTC:x.x.x.x(43842):[x]@[y]:[4893]:LOG:  connection received: host=x.x.x.x port=43842
* Empty reply from server
* Empty reply from server
* * Empty reply from server
Empty reply from server
* Empty reply from server
* Empty reply from server
* Empty reply from server
* Empty reply from server
* Empty reply from server
* Empty reply from server
* Empty reply from server
* Empty reply from server
* Empty reply from server
* Empty reply from server
* Empty reply from server
* Empty reply from server
* Empty reply from server
* Empty reply from server
* Empty reply from server
2024-01-03 07:35:07 UTC:x.x.x.x(35529):[x]@[y]:[4894]:LOG:  connection received: host=x.x.x.x port=35529
* * Closing connection
Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Closing connection
* Empty reply from server
* Empty reply from server

Can anyone shed any light as to what is happening?

Col
asked 4 months ago370 views
1 Answer
0

RDS using IAM authentication, is generated via token and this is valid only for 15 minutes and is expected behavior of IAM. If your connection sometimes when interrupted, lambda can initiate same token afterwards to reconnect or make connection longer than 15 minutes after token generation, it will fail as the token is no longer valid. Hence the connection will fail intermittently as it might be timing out at certain occasions as it might be trying with same token which is invalid. Please ensure to generate the token again after every 10-12 minutes in the lambda before making the connection again if interrupted or connection taking longer. Note that this timeout value is only for making authentication and not for connection which is already established via IAM authentication.

AWS
SUPPORT ENGINEER
answered 4 months ago
profile picture
EXPERT
reviewed 23 days ago
  • Thank-you for taking the time to reply.
    Everything you have said is absolutely correct and can't be refuted.

    Additional information for onlookers is that each and every connection used a new token, so it being caused by token expiry is not the cause of the issues in our case.

    It turns out that the appearance of SSL connection timeouts in the logs and then further analysis of lowFreeable Memory gave a strong indication that the database SKU should be upgraded so that there was more free memory. We upgraded the SKU from db.t3.micro to medium and the issue has gone, completely.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions