1 Answer
- Newest
- Most votes
- Most comments
2
Hi, AWS Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources. And Amazon EMR (previously called Amazon Elastic MapReduce) is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data.
AWS Glue Supporting Apache Spark and Amazon EMR serverless availability is what makes the overlapping between each other. Always remember that what you may recommend should depend on the user persona and use case.
From a recommendation point of view:
- AWS Glue is our recommended service for Data Integration workloads and ETL from legacy platforms such as Informatica, Talend etc.
- Amazon EMR is our recommended service for Big Data workloads that are traditionally run on Hadoop.
Use Amazon EMR:
- Hadoop Migration from on-prem or other cloud providers, including Databricks migration
- Customer has expertise beyond just Spark, for ex. Hive, Presto, Trino
- Customer is skilled in loading their own data source connector libraries for their jobs.
Use AWS Glue:
- Customer prefers built-in capabilities: connectors, transformations, incremental load, job monitoring, orchestration.
- Customer wants visual and code ETL development tools
- Migration from ETL providers such as Informatica, Talend, Matillion
answered a year ago
Relevant content
- asked 3 months ago
- asked 6 years ago
- Accepted Answerasked 4 days ago
- asked a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- How do I install and troubleshoot Python libraries in Amazon EMR and Amazon EMR Serverless clusters?AWS OFFICIALUpdated 3 months ago
Thank you!!