Customer is looking for ways to get data into AWS

0

We are using a pretty complex solution to ETL data from our customers on-prem systems to our API/DB’s and vice-versa. I wonder is there something like AWS Glue could help with that. We don't get direct DB access and usually opt for some sort of nightly data file in CSV format.

1 Answer
0
Accepted Answer

Yes and lets break it down it into 3 steps assuming significant data size and processing needs. This is just one way of doing things.

  1. Extract - This part as you described in your customer's case is only fetch. Assuming their protocol of transfer is SFTP they use AWS Transfer for SFTP. In Python they can use requests library to make restful API calls to fetch their data as well. This Python can be part of Glue job as Glue supports Python Shell jobs.
  2. Once the data is in S3 or in your customers environment they need to transform and load (TL) it or Load it and then transform it (LT).
  3. If the end state desired is a data warehouse where this data will be queried, aggregated, analyzed by their users and executives to make business decisions then it would make sense to choose Amazon Redshift as the destination.
  4. They can use AWS Glue which is nothing but managed Spark to transform and load to Redshift or simply load to Redshift and then transform the data.
  5. Last but not the least they would need a scheduler to tie it end to end hence they can use Apache Airflow or AWS Step Functions

Note : An alternative for AWS Glue could be the offerings from the Amazon EMR eco system with its own sets of tool and choices but Glue is a managed environment so unless absolutely necessary for a good reason they should not sign up for extra management overhead.

AWS
Kunal_G
answered 4 years ago
profile picture
EXPERT
reviewed 19 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions