VMware on AWS - AWS Backup failed with error "Backup job expired before completion"


We have implemented AWS backup solution to backup the vms in SDDC ( VMC) . There are about 100's of vms to be backup and we have 4 Backup gateway appliances installed. As per the documentation AWS backup gateway allows 4 concurrent vms per Backup gateway. We tried increasing the Backup window and still failing.

Question : is there a way to monitor the Backup gateway appliance and the time taken per vm and per backup task ? How do we know/estimate the number of backup gateway appliances are needed to size the backup window?

  • did i answer your question?

1 Answer
Accepted Answer


Yes, you can monitor the Backup Gateway appliance and the time taken per VM and per backup task using AWS CloudWatch metrics and logs.

AWS Backup Gateway emits various metrics to CloudWatch, including the number of bytes transferred, the number of backup jobs processed, and the duration of each job. These metrics can be used to monitor the Backup Gateway appliance and identify any bottlenecks or performance issues. You can also set up CloudWatch alarms to alert you if certain thresholds are exceeded, such as if a backup job takes longer than expected.

In addition to CloudWatch metrics, you can also view logs generated by the Backup Gateway appliance. These logs contain detailed information about each backup job, including the start time, end time, and any errors that occurred during the backup process. You can use these logs to troubleshoot issues and identify areas for improvement.

To estimate the number of Backup Gateway appliances needed to size the backup window, you'll need to consider a few factors, such as the number of VMs being backed up, the size of the backups, and the available network bandwidth.

As you mentioned, each Backup Gateway appliance can handle up to 4 concurrent VMs. So, if you have 100 VMs to back up, you'll need at least 25 Backup Gateway appliances. However, this assumes that each backup job takes the same amount of time and that network bandwidth is not a limiting factor. In reality, some backup jobs may take longer than others, and network bandwidth may limit the number of concurrent backup jobs that can be processed.

To size the backup window, you'll need to test the backup process with a representative sample of VMs and measure the time it takes to complete each backup job. Based on this data, you can estimate the total time required to back up all VMs and determine how many Backup Gateway appliances are needed to meet your backup window requirements.

Please let me know if I answered your question

answered 14 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions