- Newest
- Most votes
- Most comments
Based on your description, there are several potential causes for the intermittent "could not contact db" errors in your Go application on ECS connecting to RDS PostgreSQL.
Since one node continues to work while another experiences issues, and there are no visible spikes in RDS metrics, this points to a network connectivity issue rather than a database performance problem. Here are some possible causes and troubleshooting steps:
-
Network connectivity issues: Temporary network disruptions between your ECS tasks and the RDS instance could cause intermittent connection failures. This might not affect all nodes equally, explaining why some continue to work.
-
Security group configuration: Verify that the security groups associated with your RDS instance allow traffic from your ECS tasks on the correct database port. Sometimes security group rules can be modified inadvertently.
-
VPC configuration: Check your Network ACLs, route tables, and VPC peering connections if applicable. Misconfigured network components can cause intermittent connectivity issues.
-
Connection pooling: Your Go application might be experiencing connection pool exhaustion or not properly handling connection resets. Review how your application manages database connections.
-
DNS resolution: As you suggested, DNS resolution issues within ECS could cause intermittent failures. ECS tasks might occasionally fail to resolve the RDS endpoint.
-
RDS maintenance or internal events: Even if not visible in metrics, RDS might be experiencing brief internal events that affect connectivity.
For debugging:
- Enable enhanced logging on your RDS instance
- Implement more detailed connection error logging in your Go application
- Check for any AWS service health notifications in the ap-southeast-1 region
- Consider implementing connection retry logic with exponential backoff in your application
- Monitor network metrics between your ECS tasks and RDS
Regarding SSL certificates, since you mentioned you're not using SSL to communicate with your database, the certificate expiration wouldn't be causing these issues.
Sources
Troubleshooting connection issues to your Amazon RDS DB instance - Amazon Relational Database Service
RDS connections dropping and then self-recover | AWS re:Post
Relevant content
- asked 2 years ago
