- 最新
- 投票最多
- 评论最多
Starting from @amanbajw's answer that is well formed and comprehensive.
A couple of additional snippet of information that could improve it, answering some specific aspect of the questions:
-
Complex transformation: AWS Glue has multiple transform out of the box, for example aggregate transform, relationalize (to unnest complex json), SQL Transform that allows to code your complex transformation in SQL , Custom code transform that allows to code your complex transform in Python or Scala.
-
Amazon S3 to Amazon Redshift: AWS Glue has a native connector to Amazon Redshift and allows to read and write data into Redshift natively, the Redshift connector has also pre and post actions that allows also to implement upsert based on Redshift best practices.
-
ELT in Redshift with Glue: to run ELT in Redshift , best practices would be to use Glue Python Shell jobs to start the processing in Redshift using the native python library to connect to the DB or the Redshift Data API. An example can be seen in this blog post.
hope it helps
That's excellent info, Fabrizio.
Hi Sam,
Selecting the appropriate ETL tool depends on a lot of factors such as your organization's IT landscape, knowledge with specific tools, SME availability, target state, licensing costs/constraints, operational cost to support the solution (Talend or Glue) etc. I won't go into technical comparison between Talend and Glue but there're a few points around the solution which you may want to consider along with the ones I mentioned above...
- AWS Glue is serverless so you don't need to spend time & money on undifferentiated heavy-lifting (managing servers, patching, upgrades etc.)
- You can use Glue Crawler to catalog the data on S3 allowing you to leverage Redshift Spectrum to query the data residing in S3 instead of moving the data to Redshift. You'll need to determine the use case for the solution but you don't have to move all the data to Redshift from your data lake.
- AWS Glue is tightly integrated with other AWS services and if your organization is planning to migrate to AWS, factor in the integration costs. From experience, enterprises tend to spend a lot on integrating their tools & solutions and it is one of the key criteria in any tool/vendor solution. As an example if you're building a data lake on AWS as per modern data architecture, Glue is pre-integrated with Lake Formation, data catalog etc. and you can leverage the same identity & access controls (single pane of glass).
In short, I encourage you to look at the business outcome and target state as part of your tool/technology/solution selection instead of a purely technical comparison.
Thanks, Aman. We will be looking at multiple use cases with data at the enterprise level. When we use so many services from AWS for data analytics, we do want to find tools that can replace old tech to save time and money. Hence, I wanted to gather info on what we can be achieved with glue. As AWS brings new features daily, I am expecting in no time glue will be defacto ETL for most transformation projects.
Hey Sam,
I've delved into the realm of AWS Glue and Talend for ETL jobs, and I must say, it's an intriguing journey. AWS Glue is a potent tool when it comes to handling complex transformation jobs, much like Talend. The crux of the matter lies in your specific use case and preferences.
From my experience, AWS Glue offers the convenience of serverless architecture, auto-scaling, and seamless integration with other AWS services. However, Talend brings a robust, open-source approach to the table, allowing for greater customization and flexibility.
For your data migration from S3 to Redshift with transformations, both options can perform admirably. Glue simplifies the process with its DataBrew and DataCatalog, while Talend's ETL jobs can be highly tailored to meet intricate business requirements.
My advice? Consider your team's skill set, project complexity, and cost constraints. If you value ease of use and AWS synergy, Glue might be your ticket. But if you crave customization and a community-driven approach, Talend could be your muse.
相关内容
- AWS 官方已更新 1 年前
- AWS 官方已更新 10 个月前
- AWS 官方已更新 2 个月前
- AWS 官方已更新 2 年前
I am still new to AWS Glue. I think you do have a lot of flexibility for transformations if you author your own job which is basically writing your own code. However, for data warehousing solutions, it does not appear GLUE is not very useful for ELT type of transformations eg. update dimension table in Redshift.