Can I use aws glue crawlers to create master data in delta lake tables?
I am setting up a new data lake and have been tasked with creating the master data tables in the data bricks delta lake component. I'm trying to do this in a use-case agnostic way (or as agnostic as possible), and need to automate the process where possible. I have researched aws glue crawlers, and it seems it is a good way to automatically create a schema and catalog for the data.
However, I'm not sure how to proceed. I'm assuming that creating the master data means identifying common fields in all the data sources and creating a schema for all the data using a single crawler, and then dividing this schema into facts and dimensions. After that I could use spark jobs on data bricks to extract what I need from the raw data and to populate the master data, while checking for duplicates and doing whatever other transformations that need to be done.
This plan seems like it requires a lot of manual labor though, and it's not use case agnostic in any way. Does anyone know how it could be automated further?
Any help would be much appreciated.
1) Glue crawler can crawl the Data Source and create the tables as per the schema identified from the files of the Data sets. 2) Glue crawler does not validate the common columns in different data sets.
 how glue crawler works : https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html
In-place query of S3 data without provisioning DB or creating tablesasked 4 months ago
Access denied error while creating Data Filter in Lake Formationasked 6 months ago
Is it possible to specify DB snapshot in AWS Lake Formation?Accepted Answerasked a year ago
data lake - data ingestion methodsAccepted Answerasked 4 years ago
Ingesting data into AWS Data Lake using APIsasked 5 months ago
Updating data in governed tablesasked 7 months ago
AWS Crawler to directly read Delta lake files from S3asked 4 days ago
Are you able to hide tables in a database using Lake Formation Taggingasked 3 months ago
Can I use aws glue crawlers to create master data in delta lake tables?asked a month ago
Using Athena to query AWS Lake Formation databaseasked 2 months ago