Aurora serverless v2 instance doesn't scale down to minimum 0.5 ACU + random spikes

0

We configured an Aurora Serverless V2 PostgreSQL in our cluster but unfortunately it doesn't scale down to 0.5 ACU. Our database size is around 700-800 gb. Database version: Aurora PostgreSQL 13.11 (we just upgraded this from 13.10 but the problem still exists)

Info about our setup: In our app there is a constant load and usage so we only route special requests to the serverless instance (from 7am to 5pm), during the off peak hours these requests are directed to the writer instance (which is not serverless)

Our instances:

  1. In our cluster we have a write db which is not serverless it's a db.r6g.xlarge instance which is constantly under load.
  2. We have another read instance which is a db.r6g.large instance it's responsible for a specific sync task, this is also always under constant load.
  3. We have a 3rd instance which is the SERVERLESS V2 instance. We use it only for specific reading queries during peak time to serve the unpredictable and spiky load.

A screenshot of our database instances

So we have a mixed configuration and our issue is that when we don't send requests to the serverless instance it doesn't scale down to 0.5 but stays at around 2.5-3.5 ACUs.

What we did:

  • We checked and we are sure the serverless v2 instance is not queried by us and all queries are redirected to the writer instance during the off peak hours.
  • We turned off the Performance insight monitoring on the serverless instance. Performance monitoring
  • The log export is also turned off on the serverless instance. Log export
  • max_connections and max_locks_per_transaction is set to the default
  • Failover priority is set to tier-2 on the serverless instance Failover priority

There is an image of the ACU usage from our Cloudwatch dashboard: ACU usage

There is a zoomed in version of the chart, you can see regular spikes and random high ACU usage (+ the baseline should be 0.5ACU too): Zoomed in ACU usage

I marked the off peak hours with red in the screenshot and as you can see there are random spikes and constant usage. Not that as high when we send queries but there is something definitely going on there.

Can you please help us with this issue?

1 Answer
0

Hello there,

I understand that you have configured an Aurora Serverless V2 PostgreSQL cluster but it doesn't scale down to minimum ACU set by you i.e 0.5. Further you upgraded the cluster from version 13.10 to 13.11 but the issue still persist.

Your cluster has below instances:-

1) 1 provisioned writer instance which is under constant load.
2) 1 provisioned reader instance which is also under constant load.
3) 1 serverless reader instance which you use only for specific reading queries during peak time to serve the unpredictable and spiky load.

Issue description:- When you don't send requests to the serverless instance it doesn't scale down to 0.5 but stays at around 2.5-3.5 ACUs.

Steps taken by you in order to mitigate the issue:-

-- You checked and you are sure that the serverless v2 instance is not queried by you and all queries are redirected to the writer instance during the off peak hours.

-- We turned off the Performance insight monitoring on the serverless instance.

-- The log export is also turned off on the serverless instance.

-- max_connections and max_locks_per_transaction is set to the default values.

-- Failover priority is set to tier-2 on the serverless instance Failover priority

Looking at the mitigation steps and the screenshots shared by you I understand you have followed the troublehsooting steps mentioned in below documentation but the ACU still didn't scale down to 0.5.

[+] Performance and scaling for Aurora Serverless v2 - Troubleshooting Aurora Serverless v2 capacity issues - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.setting-capacity.html#aurora-serverless-v2.troubleshooting

Please allow me explain that, as for the issue that you are facing as the steps mentioned in public documentation are already been taken care of, to further troubleshoot the resources needs to be checked and AWS support engineers have the necessary tools and permissions to dive deep and provide exact cause as to why ACU is not scaling down.

That being said, since this issue might need deeper investigation you can always open a support case with AWS Support Engineering Team. Certain non-public information might be required such as your RDS instance and metric details and information regarding your database usage hence you can open a support case with AWS using the following support link and this issue can be investigated for further troubleshooting.

[+] : AWS Support team - https://console.aws.amazon.com/support/

I sincerely hope above information is helpful for you. Have a great rest of your week!

AWS
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions