By using AWS re:Post, you agree to the Terms of Use
/How To Increase RDS Performance With Backups and Standby/

How To Increase RDS Performance With Backups and Standby

0

I'm trying to duplicate (or at least approach) the performance of my bare metal machines on RDS.

I started with db.m5.xlarge and gp2 storage, which I understand supports Baseline IOPS of 3x storage, with bursts to 3K if storage is less than 1TB (provided there are available burst credits to do so). I understand that the db.m5.xlarge instance itself supports baseline 6,000 IOPS and baseline throughput of 143.75MB.

These are the references I used to determine that information:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html
https://aws.amazon.com/ebs/volume-types/

My initial performance had storage allocation < 1TB and did not give me what I wanted, so after investigating I increased the Baseline IOPS of the storage to 6,000 IOPS, by increasing the storage allocation of the disk to 2TB. My understanding is that burst credits will be irrelevant in this configuration, since my Base IOPS are already over 3,000. This setup indeed gave me the performance I wanted, comparable to my bare metal machines. Yay! I consider this my baseline configuration.

However, on this RDS machine I will also need Backups and Multi-AZ Standby. Adding Backups caused my same test load to take twice as long; adding backups and standby caused my test load to take anywhere from 3.5 - 6x as long to complete.

So the main question is, what do I need to do with this configuration to at least get it back somewhere close to my baseline?

Looking at the graphs, I'm having a hard time finding any smoking guns. Running the test with Backups and Standby my CloudWatch Total IOPS never exceeded 2000, and Write Throughput was never higher than 73MB/S. Read Throughput did spike over 100MB/s (Max spike 154) a handful of times, but only for 1 minute each (the total test took nearly 90 minutes to complete). Enhanced Monitoring TPS (1 second granularity) never exceeded 5,000, Write Kb/s barely exceeded 100,000, and Read Kb/s peaked somewhere between 7k and 8k. So with the exception of the 5 Read Throughput spikes in the CloudWatch graphs all those metrics seem well within the Baseline performance of the machine and storage. I also tried it with the next instance class (Baseline 12,000 IOPS and 287.5 Baseline bandwidth) with similar results (see below).

Other configs I've tried:

db.m5.xlarge w/gp2 3TB allocated (9,000 IOPS)
db.m5.2xlarge w/gp2 2TB allocated (6,000 IOPS)
db.m5.2xlarge w/gp2 3TB allocated (9,000 IOPS)

db.m5.xlarge w/io1 <1TB allocated, 6,000 IOPS selected
db.m5.xlarge w/io1 <1TB allocated, 9,000 IOPS selected
db.m5.2xlarge w/io1 <1TB allocated, 6,000 IOPS selected
db.m5.2xlarge w/io1 <1TB allocated, 9,000 IOPS selected

All of these configurations provided nearly identical results; the test run took at least an hour and in some cases much, much longer.

What am I missing? Or what direction can I go to improve the machine's performance with backups and standby enabled?

Thank you!

1 Answers
0

When you enable backups RDS also enables logging such as binary logging for mysql. There is a write penalty when you enable logging. As well, in order to provide high availability with Multi-Az RDS creates a second copy of your data in another availability zone. All writes to your database are synchronously replicated to the second copy. This also creates a write latency that is greater than just writing to a single copy of your data. It is expected that single AZ without backups will outperform the highly available and durable alternatives. hth

answered 13 days ago
  • Thank you very much for your quick reply!

    I understand that, and did expect the performance to take a hit with binary logging and Standby enabled. But is there nothing that can be done to get the overall performance back to the baseline? I.E., shouldn't additional IOPS or a bigger instance class be able to return the overall performance back to something close to what I'm hoping for? And if so, how can I determine from the metrics what ultimately is necessary to get there?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions