RDS serverless timeouts

0

Hi all,

We are using RDS aurora serverless (5.6.10a) for a fair amount of time. We have suddenly start receiving timeouts from the DB when we try to perform queries. This looks a bit random, we might have successes for one endpoint and failure for another one. We can get timeouts even for very simple queries, e.g.: SELECT * FROM users WHERE user_id = 1;

This is happening on our dev and staging environment and we don't experience that yet on production. The only difference between those envs is that the DB that can go to 0 capacity units after 10 minutes of inactivity. We didn't change the version of the ORM so the way we connect should be exactly the same. We can't see something in the slow log queries and the CPU usage looks normal (won't go above 20% utilization). We can't see any deadlocks in the graphs and we don't know what else might be wrong.

Is it possible that it's something with the networking? The lambdas and the RDS are in the same VPC within private subnets and we never had a problem with that. Security groups look good, and in any case we have success responses for the same lambdas but it's quite random. Sometimes can be success sometimes can be failures.

Any ideas where to check or what kind of debugging to do? Any direction will be higly appreciated.

Stavros
已提問 2 年前檢視次數 1850 次
1 個回答
1

Hi Stravos,

"This is happening on our dev and staging environment and we don't experience that yet on production. The only difference between those envs is that the DB that can go to 0 capacity units after 10 minutes of inactivity. "

...without looking at your instance, I can only guess that it is a Serverless v1 scaling/cold start issue. Have you tried adding a baseline ACU capacity to Dev and Staging to see if the issue resolves?

See this post for a discussion of Aurora Serverless cold starts: https://aws.amazon.com/blogs/database/best-practices-for-working-with-amazon-aurora-serverless/

Additionally, Serverless V2 (preview) has an entirely different scaling architecture with Instant autoscaling so I recommend testing it and working Severless V2 into your future upgrade plans when it GA's.

AWS
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南