- Newest
- Most votes
- Most comments
To implement a real-time upsert from Snowflake to Amazon Redshift in a cost-effective manner, you can consider the following approach:
-
Use Amazon Kinesis Data Streams as an intermediary between Snowflake and Redshift. Snowflake can send data in real-time to a Kinesis stream.
-
Leverage Amazon Redshift's streaming ingestion capability to directly ingest data from the Kinesis stream. This allows for near real-time data transfer without the need for additional ETL processes.
-
Create a streaming materialized view in Redshift that reads from the Kinesis stream. This view can be set up with auto-refresh capabilities to automatically update as new data arrives.
-
Implement a change data capture (CDC) mechanism using a stored procedure in Redshift. This procedure can handle the upsert logic, comparing the incoming data from the streaming materialized view with the existing data in your target tables.
-
Create a user-facing table that reflects the latest state of your data after the upserts.
To set this up:
-
Create an IAM role with the necessary permissions for Redshift to access the Kinesis stream.
-
Ensure your data is flowing from Snowflake into the Kinesis topic.
-
In Redshift, create an external schema that points to your Kinesis stream.
-
Create a streaming materialized view that reads from this external schema.
-
Develop a stored procedure that handles the upsert logic, reading from the materialized view and updating your target tables.
-
Use Amazon EventBridge to schedule regular calls to this stored procedure, ensuring your target tables are consistently updated.
This approach is cost-effective as it minimizes data movement and leverages Redshift's built-in streaming ingestion capabilities. It also provides near real-time updates without the need for complex ETL processes or additional compute resources.
Remember to monitor performance using Redshift system views like SYS_STREAM_SCAN_STATES and SYS_QUERY_HISTORY to ensure optimal operation of your streaming ingestion and upsert processes.
Sources
Best practices to implement near-real-time analytics using Amazon Redshift Streaming Ingestion with Amazon MSK | AWS Big Data Blog
Perform near real time analytics using Amazon Redshift on data stored in Amazon DocumentDB | AWS Database Blog
I would also strongly recommend that you reach out to your account solutions architect who are always happy and able to provide a wealth of guidance on such architecture decisions.
Relevant content
- asked a year ago
- asked 8 months ago
- asked 3 years ago