How can I use Data Pipeline to back up a DynamoDB table to an S3 bucket that's in a different account?

Lesedauer: 4 Minute
0

I want to use AWS Data Pipeline to back up an Amazon DynamoDB table to an Amazon Simple Storage Service (Amazon S3) bucket that's in a different AWS account.

Short description

Note: The source account is the account where the DynamoDB table exists. The destination account is the account where the Amazon S3 bucket exists.

  1. In the source account, attach an AWS Identity and Access Management (IAM) policy that grants Amazon S3 permissions to the DataPipeline service role and DataPipeline resource role.
  2. In the destination account, create a bucket policy that allows the DataPipeline service role and DataPipeline resource role in the source account to access the S3 bucket.
  3. In the source account, create a pipeline using the Export DynamoDB table to S3 Data Pipeline template.
  4. Add the BucketOwnerFullControl or AuthenticatedRead canned access control list (ACL) to the Step field of the pipeline's EmrActivity object.
  5. Activate the pipeline to back up the DynamoDB table to the S3 bucket in the destination account.
  6. Create a DynamoDB table in the destination account.
  7. To restore the source table to the destination table, create a pipeline using the Import DynamoDB backup data from S3 Data Pipeline template.

Resolution

Attach an IAM policy to the Data Pipeline default roles

1.    In the source account, open the IAM console.

2.    Choose Policies, and then choose Create policy.

3.    Choose the JSON tab, and then enter an IAM policy similar to the following. Replace awsdoc-example-bucket with the name of the S3 bucket in the destination account.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Action": [
        "s3:AbortMultipartUpload",
        "s3:GetBucketLocation",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:ListBucketMultipartUploads",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::awsdoc-example-bucket/*",
        "arn:aws:s3:::awsdoc-example-bucket"
      ]
    }
  ]
}

4.    Choose Review policy.

5.    Enter a Name for the policy, and then choose Create policy.

6.    In the list of policies, select the check box next to the name of the policy that you just created. You can use the Filter menu and the search box to filter the list of policies.

7.    Choose Policy actions, and then choose Attach.

8.    Select the DataPipeline service role and DataPipeline resource role, and then choose Attach policy.

Add a bucket policy to the S3 bucket

In the destination account, create a bucket policy similar to the following. Replace these values in the following example:

  • 111122223333: the ID of the Data Pipeline account. For more information, see Finding your AWS account ID.
  • awsdoc-example-bucket: the name of the S3 bucket
{
  "Version": "2012-10-17",
  "Id": "",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Action": "s3:*",
      "Principal": {
        "AWS": [
          "arn:aws:iam::111122223333:role/DataPipelineDefaultRole",
          "arn:aws:iam::111122223333:role/DataPipelineDefaultResourceRole"
        ]
      },
      "Resource": [
        "arn:aws:s3:::awsdoc-example-bucket",
        "arn:aws:s3:::awsdoc-example-bucket/*"
      ]
    }
  ]
}

Create and activate the pipeline

1.    In the source account, create a pipeline using the Export DynamoDB table to S3 Data Pipeline template:

In the Parameters section, enter the Source DynamoDB table name and the Output S3 folder. Use the format s3://awsdoc-example-bucket/ for the bucket.
In the Security/Access section, for IAM roles, choose Default.

2.    Before you Activate the pipeline, choose Edit in Architect.

3.    Open the Activities section, and then find the EmrActivity object.

4.    In the Step field, add the BucketOwnerFullControl or AuthenticatedRead canned access control list (ACL). These canned ACLs give the Amazon EMR Apache Hadoop job permissions to write to the S3 bucket in the destination account. Be sure to use the format -Dfs.s3.canned.acl=BucketOwnerFullControl. Put the statement between org.apache.hadoop.dynamodb.tools.DynamoDbExport and #{output.directoryPath}. Example:

s3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,-Dfs.s3.canned.acl=BucketOwnerFullControl,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}

5.    Choose Save, and then choose Activate to activate the pipeline and back up the DynamoDB table to the S3 bucket in the destination account.

(Optional) Restore the backup in the destination account

  1. In the destination account, create a DynamoDB table. The table doesn't have to be empty. However, the import process replaces items that have the same keys as the items in the export file.
  2. Create a pipeline using the Import DynamoDB backup data from S3 Data Pipeline template:
    In the Parameters section, for Input S3 folder, enter the S3 bucket where the DynamoDB backup is stored.
    In the Security/Access section, for IAM roles, choose Default.
  3. Activate the pipeline to restore the backup to the destination table.

Related information

Bucket owner granting cross-account bucket permissions

Managing IAM policies

AWS OFFICIAL
AWS OFFICIALAktualisiert vor 8 Monaten