RDS Proxy closes the connection unexpectedly

0

We have deployed a Django application in EKS and used RDS PostgreSQL with RDS proxy as a database backend. Over the last month, we have started noticing occasional 500 "Internal Server Error" responses from our web app with the following error coming from Django:

django.db.utils.OperationalError: connection to server at "<proxy DNS name>" (<proxy IP address>), port 5432 failed: server closed the connection unexpectedly

This suggests that RDS proxy closed the client connection. In Django settings, the configured value of CONN_MAX_AGE parameter is the default 0, which means Django opens a new database connection for every query - this means that the observed failures cannot be related to RDS proxy's idle client connection timeout setting, which we have set to 30 minutes.

To deal with this issue, we have implemented retries on the service mesh level (Istio). However, we would like to know more about the root cause of the failures and why we have seen an increased frequency of them during the last month - this almost never happened previously. Looking at the proxy and the database metrics in Cloudwatch, it doesn't look like there was increased traffic during the failures. Nevertheless, could the proxy close a client connection during a scaling operation? How can we get more insight into RDS Proxy internal operations? Turning on Enhanced Logging keeps it enabled only for 24 hours and there is no guarantee that the error will occur during that time window - we are also a bit nervous enabling it on production since it can slow down performance.

1 Answer
-1

The error message "connection to server at "<proxy DNS name>" (<proxy IP address>), port 5432 failed: server closed the connection unexpectedly" suggests that the RDS proxy is closing the client connection unexpectedly. This issue could be caused by a number of factors, including increased traffic, scaling operations, or issues with the RDS proxy or the database.

Regarding the scaling operation, the RDS proxy is not affected by the scaling of the database instances. RDS proxy does not close the client connection during a scaling operation.

To get more insight into the RDS proxy internal operations, you can enable Enhanced Logging for the RDS proxy. This will provide more detailed logs and metrics, which can help you identify the root cause of the connection failures. However, it is important to note that enabling Enhanced Logging can slow down performance, so it's best to perform this step during a maintenance window or when the traffic is low.

You can also check the CloudWatch Metrics for the RDS proxy and the database. It could be that some metrics like Connections, CPU Utilization, Memory utilization, and so on are at high levels during the failures.

Another approach is to look into the RDS proxy error logs which you can find in the RDS console under the 'Logs and Events' tab. This will give you more information about the connection failures.

In addition, you can also check the RDS proxy connection pool settings, make sure that the connection pool is not too small for the number of connections you are trying to establish.

Finally, you may want to contact AWS support to get more information or help with troubleshooting the problem.

profile picture
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions