Best AWS tool to move/transform data from redshift to an API

0

Hi, I'm searching a AWS tool to help me to move data from an database in redshift to an target database, but the target database is only accessible via API. I need to do some transformationsn in the data before to move to the database target.

asked 2 years ago258 views
2 Answers
1

If you're looking to stay with native AWS services, you're options would be AWS Glue or AWS Data Pipeline.

AWS Glue is fully serverless, so you won't have to manage servers, but on the backend it's Apache Spark, so that's something to be aware of from a compatibility standpoint. AWS Data Pipeline does not restrict to Apache Spark and allows you to make use of other engines like Pig, Hive, etc. This makes it a good choice for your organization if your ETL jobs do not require the use of Apache Spark or multiple engines.

As for specific use cases, AWS Data Pipeline transforms and moves data across AWS components. It also gives you control over the compute resources that run your code and allows you to access the Amazon EMR clusters or EC2 instances. Whereas, AWS Glue is best used to transform data from its supported sources (JDBC platforms, Redshift, S3, RDS) to be stored in its supported target destinations (JDBC platforms, S3, Redshift). Again, because AWS Glue is serverless you won't have to manage compute resources, so you can focus on your ETL jobs specifically.

Both have different pricing options, so depending on your specific use case you can kick around the numbers in the AWS Pricing Calculator

If you have any more questions/information feel free to ask in the comments and I'll try to guide you to what would suit your needs best. Thanks!

AWS
AWSJoe
answered 2 years ago
0

Thank you very much for your answer. I have used a bit of Glue and my main question is if it allows me to have an API as the destination of the flow, I always wrote against another DB. I don't know Pipeline, maybe it allows using an API as a target.

answered 2 years ago
  • Understood. Unfortunately, neither services offer that ability by default. With AWS Glue, you can only select AWS Glue Data Catalog, Amazon S3, Amazon Redshift, MySQL, PostgreSQL, Oracle SQL, and Microsoft SQL Server as targets. With Data pipeline, your node options are DynamoDB, Redshift, SQL, MySQL, and S3.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions