Redshift Datashare for Customer Facing and Internal Workload

0

Hi Team, I am working on building a Data Architecture that will enable end customers to view the data through a frontend UI and also the internal users to query and play with the data in the warehouse.

For this purpose, I have chosen Redshift Managed Storage. My idea here is to have three different clusters(High Level Architecture Diagram at the end)

Cluster A - Only for ETL Source events from Kafka using Materialized view and transform the data using below ETL services and load the data to proper Dimension and Fact tables. For Reporting purpose further data aggregation to be done and stored in tables. Data Share will be enabled for the Aggregated tables as well as Dim and Fact tables. Source: Kafka, Salesforce, S3 ETL: Scheduled Queries, Step Function + AWS SDK Target: Redshift

Cluster B - Only for Customer Facing workload This cluster will be integrated with the Frontend UI, where the data from the aggregated tables/ SQL query /View will be displayed to the end customer (Role/User based access for tables/Rows should be implemented here). The idea here is to have low bottleneck and high performance for the queries to execute using its own compute in Cluster B whereas the data tables will still reside in Cluster A.

Cluster C - Only for Internal users This cluster will be used by the internal users to run their queries and play around the data. The idea here is to use the compute available in Cluster C and not affect the base ETL Cluster A in its ETL processes.

Cluster D,E,F - For internal Teams Individual teams can provision their own RA3 cluster and request access to the data in the ETL Cluster A, They can copy the data to their cluster or use data share to run their own queries or build their dashboards. The idea is to have one central storage of data in ETL Cluster A and use across multiple teams.

Please advise if this approach/ design looks optimal and valid. Please suggest me you valuable feedback and disadvantages that I should keep in mind. Any improvement or better solution that I can provide for such scenario. Would really appreciate your help and support.

Many Thanks!

Proposed HL Architecture

2 Respostas
1
Resposta aceita

Hi Britto, there is a lot to unpack with this question and probably best to schedule a quick touchpoint with a Redshift specialist such as myself to discuss further. Overall, love the workload isolation approach that your considering and it approximates some of our reference architectures. Some of the things you will want to keep in mind are (1) how to scale security in a manner that pleases the customer (e.g. are user credentials tied to an IdP or will they have to be recreated user credentials on each cluster; are there clearly understood user roles and access privileges, etc.)? (2) have you considered Redshift serverless instead of provisioned for sporadic users / spikey workloads? (3) are external tables to be used (if so, we have data sharing for external tables going GA soon)? Would be happy to meet with you to discuss these and other considerations.

AWS
asser_m
respondido há 4 meses
  • Thank you for the valuable insights. Please let me know how I can schedule call with you to discuss this further. Thanks!

0

Glad it helped Britto, please reach out to your Account Solutions Architect or Technical Account Manager, and they would be happy to set you up with a meeting with any specialist you need.

Also our data sharing public documentation has a wealth of information and case studies:

https://aws.amazon.com/redshift/features/data-sharing/

AWS
asser_m
respondido há 4 meses

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas