RDS serverless timeouts

0

Hi all,

We are using RDS aurora serverless (5.6.10a) for a fair amount of time. We have suddenly start receiving timeouts from the DB when we try to perform queries. This looks a bit random, we might have successes for one endpoint and failure for another one. We can get timeouts even for very simple queries, e.g.: SELECT * FROM users WHERE user_id = 1;

This is happening on our dev and staging environment and we don't experience that yet on production. The only difference between those envs is that the DB that can go to 0 capacity units after 10 minutes of inactivity. We didn't change the version of the ORM so the way we connect should be exactly the same. We can't see something in the slow log queries and the CPU usage looks normal (won't go above 20% utilization). We can't see any deadlocks in the graphs and we don't know what else might be wrong.

Is it possible that it's something with the networking? The lambdas and the RDS are in the same VPC within private subnets and we never had a problem with that. Security groups look good, and in any case we have success responses for the same lambdas but it's quite random. Sometimes can be success sometimes can be failures.

Any ideas where to check or what kind of debugging to do? Any direction will be higly appreciated.

Stavros
質問済み 2年前1850ビュー
1回答
1

Hi Stravos,

"This is happening on our dev and staging environment and we don't experience that yet on production. The only difference between those envs is that the DB that can go to 0 capacity units after 10 minutes of inactivity. "

...without looking at your instance, I can only guess that it is a Serverless v1 scaling/cold start issue. Have you tried adding a baseline ACU capacity to Dev and Staging to see if the issue resolves?

See this post for a discussion of Aurora Serverless cold starts: https://aws.amazon.com/blogs/database/best-practices-for-working-with-amazon-aurora-serverless/

Additionally, Serverless V2 (preview) has an entirely different scaling architecture with Instant autoscaling so I recommend testing it and working Severless V2 into your future upgrade plans when it GA's.

AWS
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ