Skip to content

Intermittent 'could not contact db' error in Go app on ECS (PostgreSQL RDS, ap-southeast-1)

0

We’re running a Go-based application on ECS, connecting to an RDS PostgreSQL instance in the ap-southeast-1 region. Recently, we’ve seen intermittent could not contact db errors in our logs, but:

  1. Other node using the same DB continues running normally
  2. No recent code changes
  3. RDS metrics (CPU, memory, connections) show no spikes

There’s nothing in the logs to explain it. Could this be caused by temporary network issues, DNS resolution failures within ECS, or RDS-level connection resets not reflected in metrics?

Our RDS SSL certificate have not updated yet on our side, but we did not use SSL to communicate with our DB.

Would appreciate any advice on how to trace or debug this further. Thanks in advance!

1 Answer
0

Based on your description, there are several potential causes for the intermittent "could not contact db" errors in your Go application on ECS connecting to RDS PostgreSQL.

Since one node continues to work while another experiences issues, and there are no visible spikes in RDS metrics, this points to a network connectivity issue rather than a database performance problem. Here are some possible causes and troubleshooting steps:

  1. Network connectivity issues: Temporary network disruptions between your ECS tasks and the RDS instance could cause intermittent connection failures. This might not affect all nodes equally, explaining why some continue to work.

  2. Security group configuration: Verify that the security groups associated with your RDS instance allow traffic from your ECS tasks on the correct database port. Sometimes security group rules can be modified inadvertently.

  3. VPC configuration: Check your Network ACLs, route tables, and VPC peering connections if applicable. Misconfigured network components can cause intermittent connectivity issues.

  4. Connection pooling: Your Go application might be experiencing connection pool exhaustion or not properly handling connection resets. Review how your application manages database connections.

  5. DNS resolution: As you suggested, DNS resolution issues within ECS could cause intermittent failures. ECS tasks might occasionally fail to resolve the RDS endpoint.

  6. RDS maintenance or internal events: Even if not visible in metrics, RDS might be experiencing brief internal events that affect connectivity.

For debugging:

  • Enable enhanced logging on your RDS instance
  • Implement more detailed connection error logging in your Go application
  • Check for any AWS service health notifications in the ap-southeast-1 region
  • Consider implementing connection retry logic with exponential backoff in your application
  • Monitor network metrics between your ECS tasks and RDS

Regarding SSL certificates, since you mentioned you're not using SSL to communicate with your database, the certificate expiration wouldn't be causing these issues.
Sources
Troubleshooting connection issues to your Amazon RDS DB instance - Amazon Relational Database Service
RDS connections dropping and then self-recover | AWS re:Post

answered a year ago
AWS
EXPERT
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.