Customer is looking for ways to get data into AWS

0

We are using a pretty complex solution to ETL data from our customers on-prem systems to our API/DB’s and vice-versa. I wonder is there something like AWS Glue could help with that. We don't get direct DB access and usually opt for some sort of nightly data file in CSV format.

1개 답변
0
수락된 답변

Yes and lets break it down it into 3 steps assuming significant data size and processing needs. This is just one way of doing things.

  1. Extract - This part as you described in your customer's case is only fetch. Assuming their protocol of transfer is SFTP they use AWS Transfer for SFTP. In Python they can use requests library to make restful API calls to fetch their data as well. This Python can be part of Glue job as Glue supports Python Shell jobs.
  2. Once the data is in S3 or in your customers environment they need to transform and load (TL) it or Load it and then transform it (LT).
  3. If the end state desired is a data warehouse where this data will be queried, aggregated, analyzed by their users and executives to make business decisions then it would make sense to choose Amazon Redshift as the destination.
  4. They can use AWS Glue which is nothing but managed Spark to transform and load to Redshift or simply load to Redshift and then transform the data.
  5. Last but not the least they would need a scheduler to tie it end to end hence they can use Apache Airflow or AWS Step Functions

Note : An alternative for AWS Glue could be the offerings from the Amazon EMR eco system with its own sets of tool and choices but Glue is a managed environment so unless absolutely necessary for a good reason they should not sign up for extra management overhead.

AWS
Kunal_G
답변함 4년 전
profile picture
전문가
검토됨 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠