Skip to content

How do I use Amazon S3 Batch Operations to copy objects that are larger than 5 GB?

5 minute read
0

I want to copy Amazon Simple Storage Service (Amazon S3) objects that are larger than 5 GB from one bucket to another.

Short description

To copy Amazon S3 objects that are larger than 5 GB from one bucket to another, use S3 Batch Operations with an AWS Lambda function. For more information, see Copying objects greater than 5 GB with Amazon S3 Batch Operations.

Note: You can also use the multipart upload API to copy Amazon S3 objects that are larger than 5 GB. For more information, see Copying an object using multipart upload.

Resolution

Before you begin, make sure that you created a destination S3 bucket to copy the objects to.

Create a Lambda function

Complete the following steps:

  1. Open the Lambda console.

  2. Choose Create function.

  3. Choose Author from scratch.

  4. For Function name, enter a name for your function, for example S3BatchCopy.

  5. Choose the Runtime dropdown list, and then choose Python 3.9.

  6. For Architecture, choose x86_64, and then choose Create function.

  7. In the built-in code editor, enter your Lambda function code.
    Example Lambda function code:

    import boto3
    import os
    from urllib import parse
    from botocore.client import Config
    from boto3.s3.transfer import TransferConfig
    
    target_bucket = os.environ['destination_bucket']
    new_prefix = os.environ['destination_bucket_prefix']
    metadata_copy = os.environ['copy_metadata']
    tagging_copy = os.environ['copy_tagging']
    storage_class = os.environ['copy_storage_class']  # Added storage class
    
    s3Client = boto3.client('s3', config=Config(retries={'max_attempts': 3}))
    
    def lambda_handler(event, context):
        task = event['tasks'][0]
        s3Key = parse.unquote_plus(task['s3Key'], encoding='utf-8')
        s3VersionId = task['s3VersionId']
        s3Bucket = task['s3BucketArn'].split(':')[-1]
    
        try:
            copy_source = {'Bucket': s3Bucket, 'Key': s3Key}
            if s3VersionId:
                copy_source['VersionId'] = s3VersionId
    
            newKey = f"{new_prefix}/{s3Key}" if new_prefix else s3Key
            myargs = {
                'ACL': 'bucket-owner-full-control',
                'StorageClass': storage_class  # Added storage class to copy arguments
            }
    
            # Add metadata if enabled
            if metadata_copy == 'Enable':
                get_metadata = s3Client.head_object(Bucket=s3Bucket, Key=s3Key, 
                                                  VersionId=s3VersionId if s3VersionId else None)
                for key in ['CacheControl', 'ContentDisposition', 'ContentEncoding', 
                           'ContentLanguage', 'Metadata', 'WebsiteRedirectLocation', 'Expires']:
                    if value := get_metadata.get(key):
                        myargs[key] = value
    
            # Add tagging if enabled
            if tagging_copy == 'Enable':
                get_obj_tag = s3Client.get_object_tagging(Bucket=s3Bucket, Key=s3Key,
                                                         VersionId=s3VersionId if s3VersionId else None)
                if tag_set := get_obj_tag.get('TagSet'):
                    myargs['Tagging'] = "&".join([f"{parse.quote_plus(d['Key'])}={parse.quote_plus(d['Value'])}" 
                                                for d in tag_set])
    
            response = s3Client.copy(copy_source, target_bucket, newKey, ExtraArgs=myargs)
            result = {'resultCode': 'Succeeded', 'resultString': str(response)}
    
        except Exception as e:
            result = {'resultCode': 'PermanentFailure', 'resultString': str(e)}
    
        return {
            'invocationSchemaVersion': event['invocationSchemaVersion'],
            'treatMissingKeysAs': 'PermanentFailure',
            'invocationId': event['invocationId'],
            'results': [{'taskId': task['taskId'], **result}]
        }
  8. Choose Deploy.

Create an IAM role for your Lambda function

Complete the following steps:

  1. Open the AWS Identity and Access Management (IAM) console.

  2. In the navigation pane choose Roles, and then choose Create role.

  3. For Trusted entity type, choose AWS service.

  4. Choose the Use case dropdown list, and then choose Lambda.

  5. Choose Next.

  6. Choose the AWSLambdaBasicExecutionRole policy, and then choose Next.

  7. For Role name, enter a name, for example LambdaS3BatchRole.

  8. Attach the following policy:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Action": [
                    "s3:GetObject",
                    "s3:GetObjectAcl",
                    "s3:GetObjectTagging",
                    "s3:GetObjectVersion",
                    "s3:GetObjectVersionAcl",
                    "s3:GetObjectVersionTagging",
                    "s3:ListBucket*"
                ],
                "Resource": "*",
                "Effect": "Allow"
            },
            {
                "Action": [
                    "s3:PutObject",
                    "s3:PutObjectAcl",
                    "s3:PutObjectTagging",
                    "s3:PutObjectLegalHold",
                    "s3:PutObjectRetention",
                    "s3:GetBucketObjectLockConfiguration",
                    "s3:ListBucket*",
                    "s3:GetBucketLocation"
                ],
                "Resource": [
                    "arn:aws:s3:::amzn-s3-demo-destination-bucket",
                    "arn:aws:s3:::amzn-s3-demo-destination-bucket/*"
                ],
                "Effect": "Allow"
            }
        ]
    }
  9. Choose Create role.

Note: If you use a customer managed AWS Key Management Service (AWS KMS) key to encrypt your bucket, then the IAM role must grant additional permissions. For more information, see My Amazon S3 bucket has default encryption using a custom AWS KMS key. How can I allow users to download from and upload to the bucket?

Create a Batch Operations permissions policy

Complete the following steps:

  1. Open the IAM console.

  2. In the navigation pane choose Roles, and then choose Create role.

  3. For Trusted entity type, choose AWS service.

  4. Choose the Use case dropdown list, and then choose S3.

  5. Choose S3 Batch Operations.

  6. Choose Next.

  7. For Role name, enter a name, for example S3BatchOperationsRole.

  8. Attach the following policy:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Action": [
                    "s3:GetObject",
                    "s3:GetObjectVersion",
                    "s3:GetBucketLocation"
                ],
                "Resource": [
                    "arn:aws:s3:::amzn-s3-demo-manifest-bucket",
                    "arn:aws:s3:::amzn-s3-demo-manifest-bucket/"
                ],
                "Effect": "Allow"
            },
            {
                "Action": [
                    "s3:PutObject",
                    "s3:GetBucketLocation"
                ],
                "Resource": [
                    "arn:aws:s3:::amzn-s3-demo-completion-report-bucket",
                    "arn:aws:s3:::amzn-s3-demo-completion-report-bucket/"
                ],
                "Effect": "Allow"
            },
            {
                "Action": [
                    "lambda:InvokeFunction"
                ],
                "Resource": "arn_of_lambda_function_created_in_step1",
                "Effect": "Allow"
            }
        ]
    }
  9. Choose Create role.

Create a Batch Operations job

Complete the following steps:

  1. Open the Amazon S3 console.
  2. In the navigation pane, choose Batch Operations, and then choose Create job.
  3. For Manifest format, select either S3 inventory report or CSV.
  4. For Manifest object, choose Browse S3.
  5. Select the source bucket to copy the objects from.
  6. Select the objects in the source bucket, and then choose Choose path.
  7. Choose Next.
  8. For Choose operation, choose Copy.
  9. For Copy destination, choose Browse S3.
  10. Select the destination bucket to copy the objects to, and then choose Choose path.
  11. Select I acknowledge that existing objects with the same name will be overwritten, and the choose Next.
  12. For Permissions, choose the IAM role dropdown list, and then select your S3BatchOperationsRole IAM role. 
  13. Chose Next, and then choose Create job.

Cross-account access

If the destination S3 bucket is in another AWS account, then you must attach a resource-based policy to the bucket.
Example resource-based policy:

{
    "Version": "2012-10-17",
    "Id": "Policy1541018284691",
    "Statement": [
        {
            "Sid": "Allow Cross Account Copy",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::SourceAccountNumber:role/LambdaS3BatchRole"
            },
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:PutObjectTagging"
            ],
            "Resource": "arn:aws:s3:::DESTINATION_BUCKET/*"
        }
    ]
}

For more information, see How do I copy Amazon S3 objects from another AWS account?

Related information

How do I troubleshoot Amazon S3 Batch Operations issues?

How do I copy all objects from one Amazon S3 bucket to another bucket?

What's the best way to transfer large amounts of data from one Amazon S3 bucket to another?

AWS OFFICIALUpdated a year ago