RDS serverless timeouts

0

Hi all,

We are using RDS aurora serverless (5.6.10a) for a fair amount of time. We have suddenly start receiving timeouts from the DB when we try to perform queries. This looks a bit random, we might have successes for one endpoint and failure for another one. We can get timeouts even for very simple queries, e.g.: SELECT * FROM users WHERE user_id = 1;

This is happening on our dev and staging environment and we don't experience that yet on production. The only difference between those envs is that the DB that can go to 0 capacity units after 10 minutes of inactivity. We didn't change the version of the ORM so the way we connect should be exactly the same. We can't see something in the slow log queries and the CPU usage looks normal (won't go above 20% utilization). We can't see any deadlocks in the graphs and we don't know what else might be wrong.

Is it possible that it's something with the networking? The lambdas and the RDS are in the same VPC within private subnets and we never had a problem with that. Security groups look good, and in any case we have success responses for the same lambdas but it's quite random. Sometimes can be success sometimes can be failures.

Any ideas where to check or what kind of debugging to do? Any direction will be higly appreciated.

Stavros
질문됨 2년 전1850회 조회
1개 답변
1

Hi Stravos,

"This is happening on our dev and staging environment and we don't experience that yet on production. The only difference between those envs is that the DB that can go to 0 capacity units after 10 minutes of inactivity. "

...without looking at your instance, I can only guess that it is a Serverless v1 scaling/cold start issue. Have you tried adding a baseline ACU capacity to Dev and Staging to see if the issue resolves?

See this post for a discussion of Aurora Serverless cold starts: https://aws.amazon.com/blogs/database/best-practices-for-working-with-amazon-aurora-serverless/

Additionally, Serverless V2 (preview) has an entirely different scaling architecture with Instant autoscaling so I recommend testing it and working Severless V2 into your future upgrade plans when it GA's.

AWS
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠