Glue ETLs using Business Object

0

We run our ETLs using the below architecture to populate the datalake : MySQL -> DMS -> S3 -> Glue -> S3

Though this architecture works fine , its heavily dependent on the Database. Also the object data is scattered across multiple tables . An ETL based on object data could be another way to retain the object structure and extract information from . Below is what I am thinking :

Application -> Kinesis Firehose -> S3 -> Glue ->S3

Has anyone tried this ? Any pros/ cons / architecture documentation would be helpful.

Note : At this point we do not have any real time data requirement, but might need in future . Let me know if there is any other information required .

1 Answer
0

Hello KoustavC,

From reading over your RePost Question, I understand that your team currently runs your ETLs using the following architecture to populate your datalake: MySQL -> DMS -> S3 -> Glue -> S3. Though the architecture is working fine you have stated It is heavily dependent on the Database and the object data is scattered across multiple tables.

You are wondering if the following architecture: Application -> Kinesis Firehose -> S3 -> Glue ->S3, has been tried and if there is any documentation available on this and the potential of an ETL based on object data which could be another way to retain the object structure and extract information from.

Please let me know if I have misunderstood your concern in anyway whatsoever.


GUIDANCE:

  1. For this proposed Architecture: Application -> Kinesis Firehose -> S3 -> Glue ->S3, The Application will need to send the data to Kinesis, Firehose will send the data to S3, and Glue will crawl the data in S3 and create tables and Glue jobs that can be stored in the S3 bucket.

Note: To the best of my knowledge I don't believe this proposed architecture has been attempted or implemented successfully, but I'm sure It can be possible.

Please see below some useful documentations/guides that can get you going down the right pathway towards success:

[1]: Streaming ETL for Data Lakes using Amazon Kinesis Firehose - 2017 AWS Online Tech Talks https://www.youtube.com/watch?v=0AGNcZfYkzw

[2]: Amazon Kinesis Data Firehose Data Transformation: https://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html

[3]: Real-time Data Streaming into Data Lake: https://catalog.us-east-1.prod.workshops.aws/workshops/ea7ddf16-5e0a-4ec7-b54e-5cadf3028b78/en-US/lab1-ingestion-storage/real-time

I hope this information is able to assist you and your use case. Feel free to ask any additional questions or express any comments/concerns and I'd be glad to address them further to the fullest extent.

Thanks and have a great day!

Best Regards, Chibby

AWS
iChibby
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions