In this article, we discuss the steps to implement near-real-time analytics using Amazon Redshift streaming ingestion with Amazon MSK.
Follow the following steps to perform cross-account streaming ingestion for Amazon Redshift:
- Create a MSK cluster in Account-1.
- Create an AWS Identity and Access Management (IAM) role in Account-1 to read the data stream using AWS best practices around applying least privileges permissions.
- Create an Amazon Redshift – Customizable IAM service role in Account-2 to assume the IAM role.
- Create an Amazon Redshift cluster in Account-2 and attach the IAM role.
- Modify the trust relationship of the MSK IAM role in order to access the Amazon Redshift IAM role on its behalf.
- Create an external schema using IAM role chaining.
- Create a materialized view for high-speed ingestion of stream data.
- Refresh the materialized view and start querying.
Account-1 setup
Complete the following steps in Account-1:
- Create a MSK cluster. For instructions, refer to Step 1 in Get started using Amazon MSK
.
- Create an IAM role msk-tutorial-role granting access to create topics on the Amazon MSK cluster by following Steps in document.
- Create a client machine and a topic in the Amazon MSK cluster as per step 3 and step 4 respectively.
- Setup a consumer to confirm the data in topic.
- Next, we create an IAM policy called MSKPolicy in Account-1.
- On the IAM console, choose Policies in the navigation pane.
- Choose Create policy.
- Create a policy called MSKPolicy and add the following JSON to your policy (provide the AWS account ID for Account-1):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"kafka:GetBootstrapBrokers"
],
"Resource": "*"
}
]
}
- In the navigation pane, choose Roles.
- Choose IAM msk-tutorial-role created in step 2.
- Attach the policy MSKPolicy.
Account-2 setup
Complete the following steps in Account-2:
- Sign in to the Amazon Redshift console in Account-2.
- Create an Amazon Redshift cluster.
- On the IAM console, choose Policies in the navigation pane.
- Choose Create policy.
- Create a policy RedshiftStreamPolicy and add the following JSON (provide the AWS account ID for Account-1):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "StmtStreamRole",
"Effect": "Allow",
"Action": [
"sts:AssumeRole"
],
"Resource": "arn:aws:iam::<Account-1>:role/msk-tutorial-role"
}
]
}
- In the navigation pane, choose Roles.
- Choose Create role.
- Select AWS service and choose Redshift and Redshift customizable.
- Create a role called RedshiftStreamRole.
- Attach the policy RedshiftStreamPolicy to the role.
Set up trust relationship
To set up the trust relationship, complete the following steps:
- Sign in to the IAM console as Account-1.
- In the navigation pane, choose Roles.
- Edit the IAM role msk-tutorial-role and modify the trust relationship (provide the AWS account ID for Account-2):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<Account-2>:role/RedshiftStreamRole",
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Setup up connectivity:
- Enable enhanced VPC routing on Redshift cluster in Account-2.
- Setup VPC peering connection between the VPC of MSK and VPC of Redshift.
- Edit Route tables of both VPCs to have route from Peering connection.
- Update the security groups of both services to allow traffic from each other.
Set up streaming ingestion
To set up streaming ingestion, complete the following steps:
- Sign in to the Amazon Redshift console as Account-2.
- Launch the Query Editor v2 or your preferred SQL client and run the following statements to access the topic present in Account-1.
- Create an external schema using role chaining (replace the IAM role ARNs, separated by a comma without any spaces around it):
CREATE EXTERNAL SCHEMA schema_stream
FROM MSK
IAM_ROLE 'arn:aws:iam::Account-2:role/RedshiftStreamRole,arn:aws:iam::Account-1:role/msk-tutorial-role'
AUTHENTICATION IAM
URI 'boot-z7n.cluster.ot.kafka.us-east-1.amazonaws.com:9098,boot-2xp.cluster.ot.kafka.us-east-1.amazonaws.com:9098,boot-x6a.cluster.ot.kafka.us-east-1.amazonaws.com:9098';
- Create a materialized view to consume data:
CREATE MATERIALIZED VIEW my_stream_vw AUTO REFRESH YES AS
SELECT *
FROM schema_stream."MSKTutorialTopic";
Kafka topic names are case sensitive and can contain both uppercase and lowercase letters. To ingest from topics with uppercase names, you can set the configuration enable_case_sensitive_identifier to true at the session or database level. To turn on auto refresh, use AUTO REFRESH YES. The default behavior is manual refresh.
- Refresh the view, which triggers Amazon Redshift to read from the topic and load data into the materialized view:
REFRESH MATERIALIZED VIEW my_stream_vw;
- Query data in the materialized view:
select * from my_stream_vw
Above query should give the data from MSK Topic.
In this article, we discussed how to set up two different AWS accounts to enable cross-account Amazon Redshift streaming ingestion. It’s simple to get started and you can perform rich analytics on streaming data, right within Amazon Redshift using existing familiar SQL.
For information about how to set up Amazon Redshift streaming ingestion from Apache Kafka sources in a single account, refer to Getting started with streaming ingestion from Apache Kafka sources.