RDS serverless timeouts

0

Hi all,

We are using RDS aurora serverless (5.6.10a) for a fair amount of time. We have suddenly start receiving timeouts from the DB when we try to perform queries. This looks a bit random, we might have successes for one endpoint and failure for another one. We can get timeouts even for very simple queries, e.g.: SELECT * FROM users WHERE user_id = 1;

This is happening on our dev and staging environment and we don't experience that yet on production. The only difference between those envs is that the DB that can go to 0 capacity units after 10 minutes of inactivity. We didn't change the version of the ORM so the way we connect should be exactly the same. We can't see something in the slow log queries and the CPU usage looks normal (won't go above 20% utilization). We can't see any deadlocks in the graphs and we don't know what else might be wrong.

Is it possible that it's something with the networking? The lambdas and the RDS are in the same VPC within private subnets and we never had a problem with that. Security groups look good, and in any case we have success responses for the same lambdas but it's quite random. Sometimes can be success sometimes can be failures.

Any ideas where to check or what kind of debugging to do? Any direction will be higly appreciated.

Stavros
preguntada hace 2 años1850 visualizaciones
1 Respuesta
1

Hi Stravos,

"This is happening on our dev and staging environment and we don't experience that yet on production. The only difference between those envs is that the DB that can go to 0 capacity units after 10 minutes of inactivity. "

...without looking at your instance, I can only guess that it is a Serverless v1 scaling/cold start issue. Have you tried adding a baseline ACU capacity to Dev and Staging to see if the issue resolves?

See this post for a discussion of Aurora Serverless cold starts: https://aws.amazon.com/blogs/database/best-practices-for-working-with-amazon-aurora-serverless/

Additionally, Serverless V2 (preview) has an entirely different scaling architecture with Instant autoscaling so I recommend testing it and working Severless V2 into your future upgrade plans when it GA's.

AWS
respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas