- Newest
- Most votes
- Most comments
Contacting support as the bot suggested is probably the fastest way to sort the issue out. But if you aren't under a paid support plan or would like to try to sort the issue out yourself while waiting for support to respond, given that only one specific cluster is having this issue, one thing you could check is if the cluster is encrypted at rest, and if so, if the key policy of the KMS key grants access to the role under which AWS Backup is running, or if the key policy contains an allow statement deferring the authorisation decision to IAM and the IAM role used by AWS backup contains an allow statement granting access to the key.
Another way to check whether KMS errors might be involved is to check if authorisation errors are logged in CloudTrail in the source or destination regions for the role AWS Backup is using. If you have your own CloudTrail logs set up for analysing with Athena, you can filter by errorcode is not null and coalesce(useridentity.sessioncontext.sessionissuer.username, '')='YourAwsBackupRoleNameHere' and apply an appropriate date-based filter just to look at the past day or two, to contain costs.
If you haven't got your own CloudTrail logs set up or no facility for analysing them, you can open the CloudTrail console first in the source region, open the event history view, and in the settings for the log view, expose the "Error code" column, so that it's easy to distinguish between errors and successful requests. Then filter the events by Event source matching "kms.amazonaws.com" or by Event name matching "Decrypt", "GenerateDataKey", "ReEncrypt", or "CreateGrant". Look for events that show an error in the Error code column and which are related to the role used by AWS Backup.
If there are no errors in the source region, repeat the process in the target region.
If you find errors, the details of the CloudTrail event should help to pinpoint the source of the issue.
In the situation you've described with an AWS Backup Copy Job stuck in the RUNNING state for an extended period, unfortunately, there isn't a direct way to cancel a Copy Job that's already in progress. AWS Backup doesn't provide a specific API call or CLI command to cancel an ongoing Copy Job.
However, you have a few potential options to address this issue:
-
Contact AWS Support: Given the unusual duration of this Copy Job (running for two months), it's advisable to reach out to AWS Support. They may be able to investigate the root cause and potentially terminate the stuck job from their end.
-
Wait for job completion or failure: Copy Jobs that have progressed to the RUNNING state cannot be cancelled directly. The job will eventually either complete or fail on its own, though in your case, this approach may not be practical given the extended duration.
-
Review and adjust backup policies: To prevent this issue in the future, you might want to review your backup policies and consider implementing timeout mechanisms or adjusting the frequency of your copy jobs.
-
Check for resource constraints: Ensure that there are no resource constraints in the destination region (ca-central-1) that might be causing the job to hang.
-
Terminate the source resource: As a last resort, if possible and safe to do so, terminating the source Aurora cluster might force the copy job to fail. However, this is a drastic measure and should only be considered if absolutely necessary and after consulting with AWS Support.
For the CREATED state jobs, these are likely queued jobs that haven't started due to the ongoing RUNNING job. Once the RUNNING job completes or fails, these should start processing.
In the meantime, to prevent accumulation of more CREATED jobs, you may want to temporarily pause your daily copy job schedule until this issue is resolved.
Remember, it's crucial to work with AWS Support on this issue, as they can provide tailored advice and potentially resolve the stuck job without risking data loss or integrity.
Sources
CopyJob - AWS Backup
CancelJob - AWS Batch
Creating backup copies across AWS accounts - AWS Backup
Relevant content
- asked 2 years ago
- asked 4 years ago
- AWS OFFICIALUpdated 19 days ago
- AWS OFFICIALUpdated a year ago

Hi Leo, Thanks for your suggestions on my AWS Backup Copy Jobs issue, though I'm replying 2 months late. We followed your recommendation and ran Athena queries against our CloudTrail logs. While we did find some API errors, they were consistent before and after the incident with nothing unusual around October 31st. We ultimately subscribed to an AWS support plan, and their engineers cancelled the stuck job. Afterward, the queue of pending jobs were marked as FAILED, and new copy jobs are now completing successfully. Appreciate your help on this thread - hopefully this resolution helps others with similar issues.