Skip to content

Apply Backup Tagging to AWS Resources

0

Hi AWS, we are implementing an automated backup-tagging solution for AWS resources (S3, DynamoDB, DocumentDB, RDS, EFS) as part of our audit and compliance requirements. AWS Backup plans are already deployed in our SDLC accounts, and resources are tagged using:

  1. AWS_Backup – to attach the backup plan
  2. Backup_Exception – for resources that should not have a backup plan

Problem

During automation, we observed two issues when tagging resources directly using a Lambda function:

  1. Some services (e.g., S3) have thousands of resources, which may exceed the Lambda 15-minute timeout.
  2. High-volume tagging operations may hit API throttling limits.

Current Approach (Day-1 Design)

We created a static JSON metadata file listing each resource with the appropriate tag values, for example:

[
  { "bucket": "bucket-1", "AWS_Backup": "d7" },
  { "bucket": "bucket-2", "AWS_Backup": "Backup_Exception", "Backup_Exception_Reference": "S3-NonProd-AB-001" },
  { "bucket": "bucket-3", "AWS_Backup": "h7" },
  { "bucket": "bucket-4", "AWS_Backup": "Backup_Exception", "Backup_Exception_Reference": "S3-NonProd-AB-001" }
]

This JSON is sent to an SQS FIFO queue, and a Lambda function processes the records and applies the tags to the corresponding S3 buckets (similarly for other services).

Is this an appropriate and scalable design for applying backup tags across large numbers of AWS resources? Are there recommended AWS best practices or alternative architectures for bulk resource tagging at scale (e.g., Step Functions, EventBridge, Batch operations, or AWS Backup tag-based assignment)?

This is meant to be done for DynamoDB, DocumentDB and couple of other AWS services for a set of Non-Production and Production accounts?

Please acknowledge and help.

4 Answers
0

I have confusion in the statement "Some services (e.g., S3) have thousands of resources, which may exceed the Lambda 15-minute timeout.". Do you actually have thousands of S3 bucket or RDS databases? Please note, you have to tag the bucket or DynamoDB table or RDS databases. You don't need to tag individual objects.

Also, note that there is resource tagging editor where you can tag the resources.

Also, in your backup plan, you don't need to apply tags to resources that does not need to be backed up. Not having the required tag means it won't get backed up.

EXPERT
answered 9 days ago
  • Yes we have ~1000 S3 buckets or even more in one AWS account.

0

Hey,

Hope you're keeping well.

For large-scale tagging across multiple services, your SQS + Lambda approach works, but you’ll likely need to add orchestration to handle timeouts and throttling. Step Functions can coordinate batches of tagging tasks, allowing you to break work into smaller chunks and retry without hitting Lambda’s 15‑minute limit. You can also use AWS Resource Groups Tagging API (tag-resources) which supports tagging multiple resources in a single call, reducing API overhead. For services supported by AWS Backup tag-based assignments, it’s often more efficient to tag at creation time via Service Catalog or IaC templates, so backup plans auto-attach without bulk updates.

Thanks and regards,
Taz

answered 7 days ago
  • Hi Taz, I am good and hope you're doing well. Can you share some working solutions emphasizing the use AWS Resource Group Tagging API (tag-resources)

0

AWS Backup does indeed allow you to apply backup plans to your resources through tagging, which is a great approach for implementing your backup strategy across applications and ensuring consistent protection of your AWS resources.

Your current approach using a Lambda function to process tags from an SQS FIFO queue is a reasonable starting point, but there are some considerations for scaling this solution:

  1. Lambda Timeout Concerns: The 15-minute Lambda timeout limitation is valid when dealing with thousands of resources. To address this, consider:
  • Breaking down your workload into smaller batches
  • Using SQS message visibility timeout to ensure messages are processed completely
  • Implementing checkpointing to track progress
  1. API Throttling: For high-volume tagging operations, implement exponential backoff and retry logic in your Lambda function to handle API throttling gracefully.

  2. Alternative Approaches:

  • AWS Resource Tagger: If you're using AMS Accelerate, you could leverage AWS Resource Tagger to apply backup tags at scale. This is designed specifically for tagging resources and integrates with backup plans.
  • Step Functions: Consider using AWS Step Functions to orchestrate the tagging process, breaking it into manageable chunks and handling retries automatically.
  • AWS Batch: For very large workloads, AWS Batch might be more appropriate than Lambda due to longer runtime allowances.
  1. Tag Structure: For AWS Backup specifically, ensure your tag key-value pairs align with your backup plans. The standard format for tags in AWS Backup is [{"Key":"string","Value":"string"}].

  2. Tag Propagation: Be aware that AWS Backup copies tags from your resources to recovery points (up to 50 tags maximum). Tags assigned during backup have priority and are copied in alphabetical order.

For your specific use case across multiple services (S3, DynamoDB, DocumentDB, RDS, EFS) and accounts, I recommend:

  • Implementing a centralized tagging strategy using AWS Organizations and Tag Policies if possible
  • Using AWS Config to monitor tag compliance across accounts
  • Considering AWS Systems Manager Automation for cross-account tagging operations
  • For DynamoDB specifically, note that you need to enable Advanced DynamoDB backup to support tagging backups

Your JSON metadata approach is sound, but consider storing this configuration in a more manageable service like AWS AppConfig or Parameter Store for easier updates and versioning.
Sources
TagResource - AWS Backup
Tag your resources to apply AMS backup plans - AMS Accelerate User Guide
What is AWS Backup? - AWS Backup
AWS Backup: How it works - AWS Backup
Copy tags onto backups - AWS Backup

answered 9 days ago
0

Hey,

Hope you're keeping well.

For large-scale tagging across multiple services, your SQS + Lambda model works, but you’ll likely need to split workloads to avoid Lambda timeouts and throttling. Use SQS batch sizes with concurrency controls, or orchestrate tagging via AWS Step Functions so you can process in parallel and handle retries gracefully. For services supported by AWS Backup tag-based resource assignment, it’s more efficient to apply tags once and let AWS Backup automatically include resources in plans without direct API calls for each.

Thanks and regards,
Taz

answered 4 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.