Best AWS tool to move/transform data from redshift to an API

0

Hi, I'm searching a AWS tool to help me to move data from an database in redshift to an target database, but the target database is only accessible via API. I need to do some transformationsn in the data before to move to the database target.

2 réponses
1

If you're looking to stay with native AWS services, you're options would be AWS Glue or AWS Data Pipeline.

AWS Glue is fully serverless, so you won't have to manage servers, but on the backend it's Apache Spark, so that's something to be aware of from a compatibility standpoint. AWS Data Pipeline does not restrict to Apache Spark and allows you to make use of other engines like Pig, Hive, etc. This makes it a good choice for your organization if your ETL jobs do not require the use of Apache Spark or multiple engines.

As for specific use cases, AWS Data Pipeline transforms and moves data across AWS components. It also gives you control over the compute resources that run your code and allows you to access the Amazon EMR clusters or EC2 instances. Whereas, AWS Glue is best used to transform data from its supported sources (JDBC platforms, Redshift, S3, RDS) to be stored in its supported target destinations (JDBC platforms, S3, Redshift). Again, because AWS Glue is serverless you won't have to manage compute resources, so you can focus on your ETL jobs specifically.

Both have different pricing options, so depending on your specific use case you can kick around the numbers in the AWS Pricing Calculator

If you have any more questions/information feel free to ask in the comments and I'll try to guide you to what would suit your needs best. Thanks!

AWS
AWSJoe
répondu il y a 2 ans
0

Thank you very much for your answer. I have used a bit of Glue and my main question is if it allows me to have an API as the destination of the flow, I always wrote against another DB. I don't know Pipeline, maybe it allows using an API as a target.

répondu il y a 2 ans
  • Understood. Unfortunately, neither services offer that ability by default. With AWS Glue, you can only select AWS Glue Data Catalog, Amazon S3, Amazon Redshift, MySQL, PostgreSQL, Oracle SQL, and Microsoft SQL Server as targets. With Data pipeline, your node options are DynamoDB, Redshift, SQL, MySQL, and S3.

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions