Skip to content

Is there a way to perform S3 batch copy operation on huge number of objects where I don't have inventory report available

0

We have an urgent requirement to perform bulk operations on millions of S3 objects, but we can't wait 24-48 hours for the first S3 Inventory report to be generated. How can we use S3 Batch Operations for immediate operations without waiting for inventory files? I referred this thread](https://repost.aws/questions/QUwNmsKUz1RsqR1eCgBXGmrA/how-to-work-with-md5checksum-for-manually-edited-manifesto-json-file#ANlmxogDq6RFGHy91CW0PfWA), where we can prepare our manifest file on demand but in my case, I have to perform frequent operations on demand.

Can you please advise.

asked a month ago76 views
2 Answers
0
Accepted Answer

Amazon S3 now offers the manifest generator feature that eliminates the need to wait for inventory reports. This capability is available through both the AWS CLI/SDKs and the S3 Console, allowing you to create and execute batch jobs immediately with dynamic filtering. This feature was available through CLI already but added in UI in last month (September'25).

Using the S3 Console for On-Demand Operations

The S3 Console provides an intuitive interface for creating batch jobs with the manifest generator. Here's the detailed process:

Step 1: Job Setup and Scope Definition

  1. Navigate to the S3 service in the AWS Management Console
  2. Select Batch Operations from the left navigation pane
  3. Click Create job and choose your desired AWS Region
  4. Under Manifest, select "Generate an object list using filters" instead of using a pre-existing inventory report
  5. Specify your source bucket (e.g., s3://your-source-bucket)
  6. Apply object filters based on your criteria:
    • Prefix filters: Target specific directories (e.g., "2024/jan-24/")
    • Storage class filters: Focus on specific storage tiers
    • Size filters: Process objects within certain size ranges
    • Date filters: Target objects created within specific timeframes

Step 2: Operation Configuration

  1. Select your desired operation (Copy, Restore, Tag, Delete, etc.)
  2. Configure operation-specific settings:
    • For Copy operations: Set destination bucket, storage class, and encryption options
    • For Restore operations: Define expiration days and retrieval tier
    • For Tagging operations: Specify tag keys and values
  3. Choose between "Use API default settings" for standard operations or "Specify settings" for custom configurations

Step 3: Job Management Settings

  1. Provide a descriptive job name and description for tracking
  2. Set job priority (1-10, with higher numbers indicating higher priority)
  3. Choose execution preference:
    • "Run job automatically" for immediate execution after creation
    • "Review before running" to verify configuration before execution
  4. Configure manifest output location for audit purposes
  5. Set up completion reporting (failed tasks only or all tasks)
  6. Select an appropriate IAM role with necessary permissions

Step 4: Review and Execute

  1. Review all configuration details in the summary screen
  2. Click "Create job" to initialize the batch operation
  3. If you selected "Review before running," manually trigger execution by selecting "Run job"

Key Advantages of the Manifest Generator

  • Immediate execution: No waiting for inventory report generation
  • Dynamic filtering: Real-time object selection based on current bucket state
  • Precise targeting: Multiple filter criteria can be combined for exact object selection

Use Case Example

For a disaster recovery scenario where you need to restore database backups stored in Glacier Deep Archive:

  1. Use storage class filter to target only "GLACIER" or "DEEP_ARCHIVE" objects
  2. Apply prefix filter to focus on backup directories (e.g., "database-backups/2024/")
  3. Set restore operation with appropriate retrieval tier and expiration
  4. Execute immediately without waiting for inventory reports

This approach transforms what was previously a 24-48 hour delay into an immediate operational response, making S3 Batch Operations suitable for time-critical scenarios while maintaining the same enterprise-scale processing capabilities.

Using CLI for On-Demand Operations

Creating S3 Batch Operations jobs with manifest generator through AWS CLI, it's best to use a JSON input file instead of typing out numerous parameters directly. Here's how to do it:

  1. First, generate a template JSON file using:
aws s3control create-job --generate-cli-skeleton
  1. The resulting JSON file contains all possible configuration fields. You'll need to:
  • Keep only the operation type you plan to use
  • Choose between using either a manifest file OR manifest generator (not both)
  • Fill in required fields and any optional ones you need
  • Remove unused fields
  1. Once your JSON file is ready, create the batch job using:
aws s3control create-job --cli-input-json file://FILE_NAME

Important: Manifest generator doesn't support cross region.

For detailed information about each configuration field, you can reference the official S3 control create-job documentation.

Fell free to reach out if you have any additional questions happy to help.

PS: We are publishing a blog post which will explain all the available options quite in details. I'll create a knowledge center article as well here at re:Post explaining this in depth.

AWS
EXPERT
answered a month ago
AWS
EXPERT
reviewed a month ago
  • Thank you for detailed explanation, I should have checked this over console. I'll explore this and comment here if I'd have any questions. Thanks again for your help.

0

For your urgent requirement to perform bulk operations on millions of S3 objects without waiting for S3 Inventory reports, you can create a user-generated manifest file in CSV format.

S3 Batch Operations supports two types of manifests:

  1. Amazon S3 Inventory reports (which take 24-48 hours to generate)
  2. CSV files that you create yourself

To perform immediate operations:

  1. Create a CSV manifest file containing the list of objects you want to process. Each row in the file must include:
  • Bucket name
  • Object key (URL-encoded)
  • Optionally, object version ID
  1. Upload this CSV manifest file to an S3 bucket.

  2. Create an S3 Batch Operations job using this CSV manifest file.

For frequent on-demand operations, you could:

  • Automate the generation of these CSV manifest files based on your specific criteria
  • Use scripts or applications to dynamically create the manifest files whenever needed
  • Implement a workflow that creates the manifest, uploads it to S3, and initiates the batch job

This approach allows you to bypass the waiting period for inventory reports and perform batch operations immediately whenever needed.

Remember that your manifest must either include version IDs for all objects or omit version IDs for all objects - you can't mix the two approaches in a single manifest.
Sources
Creating an S3 Batch Operations job - Amazon Simple Storage Service
Performing object operations in bulk with Batch Operations - Amazon Simple Storage Service

answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.