S3 Batch Copy Setup - PUSH the data not the PULL
S3 data copy using batch operations from one bucket to another cross account, where sender sends the data instead of receiver pulling the data
Amazon S3 Batch Operations is an effective approach for copying objects in bulk when the number of objects is large or the data volume is substantial. The AWS documentation explains how to create an S3 Batch Job for copying data across different AWS account S3 buckets using an inventory file. The documentation outlines the process of creating inventory files in the destination account and then creating a Batch Job in the destination account using the S3 inventory file to pull the data from the source account.
There can be scenarios where the data owner might not want to grant any read access to their S3 buckets and instead prefers to push the data. In such cases, the permissions setup would be slightly different, and there would be some key points to consider before proceeding with S3 Batch Copy operations (push).
Pre-requisites:
- Destination bucket has ACL enabled(Bucket owner preferred/object writer). To have batch copy working in source-destination direction, destination bucket must have ACLs enabled as otherwise, batch operation will try to put ACL on objects while copying at destination bucket and would fail with following error. Refer s3 object ACL’s documentation[4] and [5] in reference document section. “The bucket does not allow ACLs (Service: Amazon S3; Status Code: 400; Error Code: AccessControlListNotSupported;”
- Get the destination account canonical id
- Have an S3 bucket for storing manifest/inventory files and storing batch job completion reports
Assumption:
- Both side of s3 buckets are encrypted using SSE-KMS CMK
- No object is larger than 5GB
- Individual who will create the batch job is logged in using IAM role -> This not a requirement but an assumption for following permissions setup, replace it with your AWS principal
Variables, that need to be replaced in following example:
- inventory_bucket_name: Bucket where manifest file would be stored
- source_account_number: Account which would initiate S3 batch copy
- source_bucket_name: Source bucket from where s3 objects would be copied to target bucket
- target_bucket_name: Target bucket where s3 objects would be copied from source account
- optional_prefix: Any specific prefix(if not whole bucket), from where objects to copied through s3 batch job
- source_ac_s3_batch_copy_role: Role that would be used while creating s3 batch job
- source_acnt_iam_role: Role that would be assumed by person, who would create s3 batch job
Here is a quick description of SIDs used in following role policy/bucket policy documents to explain purpose of each of those blocks:
- S3BatchCopyInventory: Manifest bucket policy to allow write here for storing manifest file
- S3BatchTarget: Source account batch copy role to have write permissions to the whole target bucket or specific prefix
- S3BatchSource: Source account batch copy role to have read permissions on the source bucket or specific prefix
- S3BatchManifestReport: Source account batch copy role to have read and write permissions on the manifest bucket
- DenyAllUnlessApproved: Completely optional but if required to secure target bucket and restrict access only to source account batch copy role and user role which would be used by individual, creating batch job
- AllowBatchCopyRole: Source account batch copy role to have write permissions on the target bucket
- AllowConsoleBatchJobCreation: This role would allow individual at source account side to create batch job on console, otherwise it’d give permission error and source account wouldn’t be able to create batch job to push the data to destination account
Permissions setup:
Source Side:
-
Manifest bucket policy to be updated:
{ "Sid": "S3BatchCopyInventory", "Effect": "Allow", "Principal": { "Service": "s3.amazonaws.com" }, "Action": "s3:PutObject", "Resource": "arn:aws:s3:::<inventory_bucket_name>/*", "Condition": { "StringEquals": { "aws:SourceAccount": "<source_account_number>", "s3:x-amz-acl": "bucket-owner-full-control" }, "ArnLike": { "aws:SourceArn": "arn:aws:s3:::<source_bucket_name>" } } }
-
Batch Operations Role to be created at source account:
2.1. Permissions:
{ "Version": "2012-10-17”,` "Statement": [ { "Action": [ "s3:PutObject", "s3:PutObjectAcl", "s3:PutObjectVersionAcl", "s3:PutObjectVersionTagging", "s3:PutObjectTagging" ], "Resource": [ "arn:aws:s3:::<target_bucket_name>/*" ], "Effect": "Allow", "Sid": "S3BatchTarget" }, { "Action": [ "s3:GetObject", "s3:GetObjectAcl", "s3:GetObjectVersionAcl", "s3:GetObjectVersion", "s3:GetObjectVersionTagging", "s3:GetObjectTagging", "s3:GetBucketLocation" ], "Resource": [ "arn:aws:s3:::source_bucket_name", "arn:aws:s3:::source_bucket_name/<optional_prefix>/*" ], "Effect": "Allow", "Sid": "S3BatchSource" }, { "Action": [ "s3:GetObject", "s3:PutObject", "s3:GetObjectVersion", "s3:GetBucketLocation" ], "Resource": [ "arn:aws:s3:::<inventory_bucket_name>", "arn:aws:s3:::<inventory_bucket_name>/*" ], "Effect": "Allow", "Sid": "S3BatchManifestReport" }, { “Action": [ "kms:Encrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:DescribeKey" ], "Resource": “Destination_S3_Bucket_KMS_Key_ARN” "Effect": "Allow", "Sid": "AllowUseOfExternalS3KMSKey }, { "Action": [ "kms:Encrypt", "kms:Decrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:DescribeKey" ], "Resource": “Source_S3_Bucket_KMS_Key_ARN”, "Effect": "Allow”, "Sid": "AllowUseOfLocalS3KMSKey" } ] }
2.2. Batch Operations Role Trust Policy:
{ "Version":"2012-10-17", "Statement":[ { "Effect":"Allow", "Principal":{ "Service":"batchoperations.s3.amazonaws.com" }, "Action":"sts:AssumeRole" } ] }
Target Side:
-
Target bucket policy to be updated:
{ "Version": "2012-10-17", "Id": "S3-Batch-Copy-Policy", "Statement": [ { "Sid": "DenyAllUnlessApproved", "Effect": "Deny", "Principal": "*", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::<target_bucket_name>", "arn:aws:s3:::<target_bucket_name>/<optional_prefix>/*" ], "Condition": { "StringNotLike": { "aws:PrincipalArn": [ "arn:aws:iam::<source_account_number>:role/<source_ac_s3_batch_copy_role>", "arn:aws:iam::<source_account_number>:role/<source_acnt_iam_role>" ] } } }, { "Sid": "AllowBatchCopyRole", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<source_account_number>:role/<source_ac_s3_batch_copy_role>" }, "Action": [ "s3:PutObject", "s3:PutObjectAcl", "s3:PutObjectTagging" ], "Resource": "<target_bucket_name>/<optional_prefix>/*" }, { "Sid": "AllowConsoleBatchJobCreation", "Effect": "Allow", "Principal": { "AWS": [ arn:aws:iam::<source_account_number>:role/<source_acnt_iam_role> ] }, "Action": [ "s3:Get*", "s3:List*" ], "Resource": [ "arn:aws:s3:::<target_bucket_name>", "arn:aws:s3:::<target_bucket_name>/<optional_prefix>/*" ] } ] }
-
Target bucket KMS key policy to be updated:
{ "Sid": "Cross account KMS Key access", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<source_account_number>:role/<source_ac_s3_batch_copy_role>" }, "Action": [ "kms:Encrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:DescribeKey" ], "Resource": “Destination_S3_Bucket_KMS_Key_ARN” }
Now we have completed the permissions setup, next step is create the s3 batch copy job.
Create Batch copy job:
- Login to source account, go to S3 console and click on Batch Operations -> Create job
- Choose S3 inventory report (manifest.json) option(if you are using inventory file)
- Provide s3 location of manifest.json file
- Click Next
- Choose Copy options and provide destination bucket location
- Choose Storage class as required
- Assuming destination bucket is using SSE-KMS CMK, in Server-side encryption option, select Specify an encryption key and then select Use destination bucket settings for default encryption
- Choose other option as appropriate
- In Access control list (ACL), click on Add grantee, enter canonical id, which you would have received from destination account side, tick boxes for Objects Read and Object ACL Read and Write.
- Click Next and in Completion report, check Generate completion report and provide completion report destination location, this is the location where s3 batch job would place the result
- Provide IAM role ARN, which you created above in Permissions setup
- Click Next, review the details and click Create job
- Refresh the page, once it says Awaiting your confirmation, Run the job and wait for results to stored in completion report location
- Review the completion report
Note: Once the data is copied at destination bucket, destination bucket ACLs can be disabled and copied object would still be accessible. However, as mentioned in the beginning, to have S3 batch copy working from source to destination, destination bucket must have ACL enabled.
Reference AWS Documents:
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops-iam-role-policies.html
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops-create-job.html
- https://aws.amazon.com/blogs/aws/new-amazon-s3-batch-operations/
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/about-object-ownership.html#object-ownership-overview
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/acl-overview.html
Relevant content
- Accepted Answerasked 4 months agolg...
- asked 6 months agolg...
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 21 days ago