Approach to pull data (>1M) periodically from a third party application's database for BI use cases within Redshift

0

Hi,

I am considering Glue to connect to a third party application's database (Oracle) and bring a data set (in excess of 1M rows) obtained by joining multiple tables at source end. The destination for the data set will be stored in a AWS Redshift table. Further aggregations of data will be performed within Redshift for BI/Analytics.

I have two questions (1) Is the above approach cost effective in the long run? The process will be run daily.
(2) Is it advisable to bring (more) data from source tables as-is without performing any joins during extraction? Once in Redshift, perform the joins and aggregations within Redshift?

Thanks in advance,

S Ray

ssray23
已提问 3 个月前346 查看次数
1 回答
1

Depends a lot on your situation, the three options are valid.
If you can join and aggregate in Oracle is the most efficient if you just need that, as long as the DB has spare capacity for it.
If you have some Zero-ETL to move the tables into Redshift, that avoids adding a tool and then you can do it there.
And if you already have an ETL like Glue or need it for something else, then you could do it there and avoid pushing the workload the Redshift

profile pictureAWS
专家
已回答 3 个月前
  • Thanks Gonzalo. As we already use Glue for other ETLs, I think the third option is the most feasible. Agree, first is the best, but there are limitations as to how much leverage we have on running expensive queries on the third party owned oracle db server.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则