Real time upsert to be done to Redshift tables

0

We have requirement to transfer data from Snowflake to Redshift in Real time. The details are as follows,

  • Snowflake will be sending the data in real time for the required views.
  • These need to be processed and upsert needs to be performed in Redshift tables in real time.

What is the best method to implement this? it should also be cost effective.

asked 8 months ago183 views
2 Answers
0

To implement a real-time upsert from Snowflake to Amazon Redshift in a cost-effective manner, you can consider the following approach:

  1. Use Amazon Kinesis Data Streams as an intermediary between Snowflake and Redshift. Snowflake can send data in real-time to a Kinesis stream.

  2. Leverage Amazon Redshift's streaming ingestion capability to directly ingest data from the Kinesis stream. This allows for near real-time data transfer without the need for additional ETL processes.

  3. Create a streaming materialized view in Redshift that reads from the Kinesis stream. This view can be set up with auto-refresh capabilities to automatically update as new data arrives.

  4. Implement a change data capture (CDC) mechanism using a stored procedure in Redshift. This procedure can handle the upsert logic, comparing the incoming data from the streaming materialized view with the existing data in your target tables.

  5. Create a user-facing table that reflects the latest state of your data after the upserts.

To set this up:

  1. Create an IAM role with the necessary permissions for Redshift to access the Kinesis stream.

  2. Ensure your data is flowing from Snowflake into the Kinesis topic.

  3. In Redshift, create an external schema that points to your Kinesis stream.

  4. Create a streaming materialized view that reads from this external schema.

  5. Develop a stored procedure that handles the upsert logic, reading from the materialized view and updating your target tables.

  6. Use Amazon EventBridge to schedule regular calls to this stored procedure, ensuring your target tables are consistently updated.

This approach is cost-effective as it minimizes data movement and leverages Redshift's built-in streaming ingestion capabilities. It also provides near real-time updates without the need for complex ETL processes or additional compute resources.

Remember to monitor performance using Redshift system views like SYS_STREAM_SCAN_STATES and SYS_QUERY_HISTORY to ensure optimal operation of your streaming ingestion and upsert processes.
Sources
Best practices to implement near-real-time analytics using Amazon Redshift Streaming Ingestion with Amazon MSK | AWS Big Data Blog
Perform near real time analytics using Amazon Redshift on data stored in Amazon DocumentDB | AWS Database Blog

answered 8 months ago
EXPERT
reviewed 8 months ago
0

I would also strongly recommend that you reach out to your account solutions architect who are always happy and able to provide a wealth of guidance on such architecture decisions.

AWS
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions