AWS Backup VSS snapshot fails

0

I am backing up about 45 Windows Server EC2 instances with AWS Backup. One of the AWS Backup jobs, for about 35 of those instances does a VSS snapshot as part of the backup. I get a lot of VSS failure messages. Some of them are VSS timeouts, which I understand is a Windows issue that occurs because of an unconfigurable 10 second max time for the snapshot to complete. Some are related to the AWS VSS provider. In AWS Backup the error is "Windows VSS Backup Job Error encountered, trying for regular backup". The job then completes, but without a VSS snapshot. In SSM, the Run Command error for this task is:

Encountered unexpected error. Please see error details below Message : The process cannot access the file 'C:\ProgramFiles\Amazon\AwsVssComponents\vsserr.log' because it is being used by another process.

I tried to rename this file (just as a test, to see if it was in use) and says it is in use by the ec2-vss-agent.exe. So I stopped the EC2 VSS Windows service but that did not stop the ec2-vss-agent.exe process and the error remained. I did an 'end task' on the ec2-vss-agent.exe process and I then manually ran the VSS Run Command from SSM. It re-started the process, and it ran for awhile before timing out, which is the other (unrelated?) issue we see too. I can not find anything online about this issue or error and I'm at a loss as far as where to look from here. I need VSS snapshots of these servers. If anyone has any ideas about how to troubleshoot this or what else to look for, please let me know!

asked 2 years ago3597 views
1 Answer
0

Hello,

Few of the items you can look at; some may seem obvious and you probably already checked them out, but I will list them just for scrutiny:

  1. Please validate if the latest versions are installed for both SSM and AWS PowerShell modules in the EC2 Instances (1-2).
  2. Can you please confirm if there are no AntiVirus/Security Software which may be scanning the folders and thus the process?
  3. If the system is too busy when taking the snapshot, the VSS backup may also time out. Is there a maintenance window to execute the same process off hours in case this is not the case?
  4. Check the SSM logs from the instance during the execution of the Snapshot, under "C:\ProgramData\Amazon\SSM\Logs" to see if something is outstanding around the time the Backups is being executed.

I hope the above information can led to a path of troubleshooting to find new clues.

References

  1. Amazon SSM Agent Releases - https://github.com/aws/amazon-ssm-agent/releases
  2. Installing the AWS Tools for PowerShell on Windows - https://docs.aws.amazon.com/powershell/latest/userguide/pstools-getting-set-up-windows.html
AWS
SUPPORT ENGINEER
answered 2 years ago
  • Thanks for the suggestions.

    1. We do have up to date versions of the agents.
    2. We are checking A/V exclusions. However, the process that is locking the file is ec2-vss-agent.exe so I don't think it would be an A/V issue, but we are having the security team check the configuration to be sure.
    3. The systems are not too busy when the backups run, which is in the middle of the night.
    4. The SSM logs on the server don't show any issues other than the same error message that shows up in the SSM GUI console. About an hour and 15 minutes before the backup job there is a SSM job that does time synchronization and that job completes successfully.
  • The initial error I posted is the first few lines of the error. I have copied the entire error below: Encountered unexpected error. Please see error details below Message : The process cannot access the file 'C:\ProgramFiles\Amazon\AwsVssComponents\vsserr.log' because it is being used by another process. Data : {} InnerException : TargetSite : Void WinIOError(Int32, System.String) StackTrace : at System.IO.__Error.WinIOError(Int32 errorCode, StringmaybeFullPath)at System.IO.FileInfo.Delete()at Microsoft.PowerShell.Commands.FileSystemProvider.RemoveFileSystemItem(FileSystemInfo fileSystemInfo, Boolean force) HelpLink : Source : mscorlib HResult : -2147024864 MyCommand : Remove-Item BoundParameters : {} UnboundArguments : {} ScriptLineNumber : 740 OffsetInLine : 9 HistoryId : 1 ScriptName : C:\ProgramData\Amazon\SSM\InstanceData<instance-id>\document\orchestration\d8987ebd-b404-4fd3-90d8-5729d9a39426\runPowerShellScript_script.ps1 Line : del $vssStdErr PositionMessage : At C:\ProgramData\Amazon\SSM\InstanceData<instance-id>\document\orchestration\d8987ebd-b404-4fd3-90d8-5729d9a39426\runPowerShellScript_script.ps1:740 char:9

    • del $vssStdErr

    PSScriptRoot : C:\ProgramData\Amazon\SSM\InstanceData<instance-id>\document\orchestration\d8987ebd-b404-4fd3-90d8-5729d9a39426\runPowerShellScript PSCommandPath : C:\ProgramData\Amazon\SSM\InstanceData<instance-id>\document\orchestration\d8987ebd-b404-4fd3-90d8-5729d9a39426\runPowerShellScript_script.ps1 Invocation

  • We've checked all of the items above. The last thing that we did was exclude the folder from A/V but that did not make a difference. It seems like the issue is as stated above, that the ec2-vss-agent.exe process is causing the issue. Any other ideas as to why that process would be keeping the vsserr.log open even after the VSS Run Command completes?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions