Customer is looking for ways to get data into AWS

0

We are using a pretty complex solution to ETL data from our customers on-prem systems to our API/DB’s and vice-versa. I wonder is there something like AWS Glue could help with that. We don't get direct DB access and usually opt for some sort of nightly data file in CSV format.

AWS
管理員
已提問 4 年前檢視次數 242 次
1 個回答
0
已接受的答案

Yes and lets break it down it into 3 steps assuming significant data size and processing needs. This is just one way of doing things.

  1. Extract - This part as you described in your customer's case is only fetch. Assuming their protocol of transfer is SFTP they use AWS Transfer for SFTP. In Python they can use requests library to make restful API calls to fetch their data as well. This Python can be part of Glue job as Glue supports Python Shell jobs.
  2. Once the data is in S3 or in your customers environment they need to transform and load (TL) it or Load it and then transform it (LT).
  3. If the end state desired is a data warehouse where this data will be queried, aggregated, analyzed by their users and executives to make business decisions then it would make sense to choose Amazon Redshift as the destination.
  4. They can use AWS Glue which is nothing but managed Spark to transform and load to Redshift or simply load to Redshift and then transform the data.
  5. Last but not the least they would need a scheduler to tie it end to end hence they can use Apache Airflow or AWS Step Functions

Note : An alternative for AWS Glue could be the offerings from the Amazon EMR eco system with its own sets of tool and choices but Glue is a managed environment so unless absolutely necessary for a good reason they should not sign up for extra management overhead.

AWS
Kunal_G
已回答 4 年前
profile picture
專家
已審閱 1 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南