SQL Server full backup to S3 is slow at times



We have SQL 2017 ENT running on EC2 instances (i3.4xlarge) in Ireland. We are using Litespeed to do full and log backups to S3 in the same region.

This is a very large Database. Through litespeed we are doing maximum compression and stripping the full backup to 7 files and each file size is ~430 GB.

In the SQL server side, full backups are scheduled to always run in AG secondary replica, so it is only doing the backup work load nothing else.

A full back usually completes in ~ 16 to 18 hours. But sometimes the full backup takes 30 to 40 hours and sometimes in worst cases it is taking 60 to 75 hrs to complete.

For the slow running backups, at the moment we dont have a clue on where things are actually slowing down. One thing I notice from the litespeed backup summary - whenever backup throughput reduces, the backup time increases proportionally. How would I go about troubleshooting this to find out the root cause and what actions should I take so that full backup always completes in less than 20 hours?

Thanks in advance.

asked a year ago552 views
1 Answer

The first step in any situation like this is your error logs. Assuming you have checked the error logs and found no errors that correlate with timing we then go on to the next step of checking wait statistics. By checking wait statistics, you can see the major symptoms of your issue. For example, if you see WRITELOG as your top wait statistic, then you know your issues are most likely related to transaction log write speed. If you were to see LCK_xx, then you know it's a locking problem.

To get started, begin with Wait Statistics advice from Paul Randal, the former team lead of the SQL Server Storage Engine - https://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/. It includes my favorite script for retrieving your SQL Server's wait statistics. Paul has numerous other resources there for you to take advantage of, such as a full library describing all of the wait stats and lots of great blog posts explaining them. Also, check out the blog post for the most worrying wait types - https://www.sqlskills.com/blogs/paul/worrying-wait-type/.

Of course, there are deeper symptoms and anti-patterns that feed the wait statistics. But to uncover those issues would require a full troubleshooting session.

Hope that helps you get started. Cheers, -Kevin

profile pictureAWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions