How to setup cross-account Redshift auto copy

4 minute read
Content level: Expert
0

I want to setup auto copy job from Amazon Simple Storage Service (Amazon S3) bucket from one account to Amazon Redshift in another account. How can I setup auto copy job between two different accounts?

Short description

You can use a COPY JOB to load data into your Amazon Redshift tables from files that are stored in Amazon S3. Amazon Redshift detects when new Amazon S3 files are added to the path specified in your COPY command. A COPY command is then automatically run without you having to create an external data ingestion pipeline. Amazon Redshift keeps track of which files have been loaded. Amazon Redshift determines the number of files batched together per COPY command. You can see the resulting COPY commands in system views.

To setup auto copy job from S3 bucket in account A to Amazon Redshift in account B, follow these steps:

Account A (Configure the source location):

  1. Choose the source location that you're copying data from (the S3 bucket in Account A).
  2. Create a role and provide necessary permissions on this bucket to another account

Account B (Configure the destination location):

  1. Create a new Amazon Redshift cluster (As of 10/4, this feature is available in preview cluster and follow this Redshift auto copy documentation for more details)
  2. Create a role and provide necessary permissions on the role created in Account A

Architecture

Cross Account Redshift Auto Copy Architecture

Resolution

Account A (Where S3 bucket is available)

  1. Create a role named account_A_s3_access_role
  2. Attach inline policy as below, (Update the <replace-with-account-A-bucket-name>, <aws-account-A-id>, <region> KMS key if encrypted)

Note: Below role can also be used to setup cross account Redshift Spectrum access.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "glue:BatchCreatePartition",
                "glue:UpdateDatabase",
                "glue:CreateTable",
                "glue:DeleteDatabase",
                "glue:GetTables",
                "glue:GetPartitions",
                "glue:BatchDeletePartition",
                "glue:UpdateTable",
                "glue:BatchGetPartition",
                "glue:DeleteTable",
                "glue:GetDatabases",
                "glue:GetTable",
                "glue:GetDatabase",
                "glue:GetPartition",
                "glue:CreateDatabase",
                "glue:BatchDeleteTable",
                "glue:CreatePartition",
                "glue:DeletePartition",
                "glue:UpdatePartition"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucketMultipartUploads",
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::<replace-with-account-A-bucket-name>",
                "arn:aws:s3:::<replace-with-account-A-bucket-name>/*"
            ]
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt"
            ],
            "Resource": [
                "arn:aws:kms:<region>:<aws-account-A-id>:key/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx"
            ]
        }
    ]
}
  1. Edit trust relationship, For Select type of trusted entity, choose Another AWS account
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::redshift_accountA:role/redshift_access_role"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Account 2 (Where Redshift cluster is available)

  1. Create a role named redshift_access_role
  2. Attach managed policy AmazonS3ReadOnlyAccess to it
  3. Attach managed policy AWSGlueConsoleFullAccess to it
  4. Attach inline policy, (Modify and add the role created in Account A under resource section)
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1487639602000",
            "Effect": "Allow",
            "Action": [
                "sts:AssumeRole"
            ],
            "Resource": "arn:aws:iam::<aws-account-A-id>:role/account_A_s3_access_role"
        }
    ]
}
  1. Update the trust relationship as below,
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "redshift.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Test the Redshift auto copy (In Account B)

Table preparation

CREATE SCHEMA IF NOT EXISTS crossaccountcopy;
CREATE TABLE IF NOT EXISTS crossaccountcopy.test_table(
  queue                VARCHAR(64)
, username           VARCHAR(128)
  );

TRUNCATE TABLE crossaccountcopy.test_table;

COPY Job on cross account s3 bucket

COPY crossaccountcopy.test_table
FROM 's3://<replace-with-account-A-bucket-name>/000.gz'
iam_role 'arn:aws:iam::<aws-account-B-id>:role/redshift_access_role,arn:aws:iam::<aws-account-A-id>:role/account_A_s3_access_role'
IGNOREHEADER AS 1
dateformat 'auto'
timeformat 'auto'
CSV
GZIP
JOB CREATE job_red_cab AUTO ON;

After the COPY job finishes, you can access the data transferred from your S3 bucket in account A in the Amazon Redshift in account B.

COPY Command system tables

-- Execute copy job manually
COPY JOB RUN job_red_cab;

-- Review the status of the refresh
SELECT job_id, job_name, data_source, copy_query,filename,status, curtime
FROM sys_copy_job copyjob
JOIN stl_load_commits loadcommit
ON copyjob.job_id = loadcommit.copy_job_id;

-- History of data load
SELECT * FROM SYS_LOAD_HISTORY;

-- History of data commits
SELECT * FROM stl_load_commits;

-- History of errors
SELECT * FROM SYS_LOAD_ERROR_DETAIL;

-- Master information for copy jobs
SELECT * FROM sys_copy_job;
profile pictureAWS
EXPERT
published 7 months ago1101 views