RDS serverless timeouts

0

Hi all,

We are using RDS aurora serverless (5.6.10a) for a fair amount of time. We have suddenly start receiving timeouts from the DB when we try to perform queries. This looks a bit random, we might have successes for one endpoint and failure for another one. We can get timeouts even for very simple queries, e.g.: SELECT * FROM users WHERE user_id = 1;

This is happening on our dev and staging environment and we don't experience that yet on production. The only difference between those envs is that the DB that can go to 0 capacity units after 10 minutes of inactivity. We didn't change the version of the ORM so the way we connect should be exactly the same. We can't see something in the slow log queries and the CPU usage looks normal (won't go above 20% utilization). We can't see any deadlocks in the graphs and we don't know what else might be wrong.

Is it possible that it's something with the networking? The lambdas and the RDS are in the same VPC within private subnets and we never had a problem with that. Security groups look good, and in any case we have success responses for the same lambdas but it's quite random. Sometimes can be success sometimes can be failures.

Any ideas where to check or what kind of debugging to do? Any direction will be higly appreciated.

Stavros
已提问 2 年前1850 查看次数
1 回答
1

Hi Stravos,

"This is happening on our dev and staging environment and we don't experience that yet on production. The only difference between those envs is that the DB that can go to 0 capacity units after 10 minutes of inactivity. "

...without looking at your instance, I can only guess that it is a Serverless v1 scaling/cold start issue. Have you tried adding a baseline ACU capacity to Dev and Staging to see if the issue resolves?

See this post for a discussion of Aurora Serverless cold starts: https://aws.amazon.com/blogs/database/best-practices-for-working-with-amazon-aurora-serverless/

Additionally, Serverless V2 (preview) has an entirely different scaling architecture with Instant autoscaling so I recommend testing it and working Severless V2 into your future upgrade plans when it GA's.

AWS
已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则