Best practices for bulk data loading in AWS Redshift - Glue or Copy

0

What are the pros and cons when it comes to using AWS Glue over Redshift's internal functions (such as COPY and INSERT)? for bulk data loading (In terms of cost, time, and adaptability). It's really appreciated if you can provide some examples use cases.

已提問 10 個月前檢視次數 351 次
1 個回答
0
已接受的答案

Hi, AWS Glue is an ETL service: T is the key letter. If you need to transform the source data before your load into RedShift, Glue will be highly useful.

For example, Glue provides lots of wired in simple and adanced transformations that you can integrate in your Glue-Based ETL pipeline: see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-transforms.html

Also, you may want to measure the quality of your data, before loading it to ensure constant quality. Then AWS Glue Data Quality may be very helpful: see https://aws.amazon.com/blogs/big-data/getting-started-with-aws-glue-data-quality-from-the-aws-glue-data-catalog/

Hope it helps,

Didier

profile pictureAWS
專家
已回答 10 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南