AWS Glue Catalog Data Tables

0

Hi all, I had a quick question regarding AWS Glue Catalog Tables. I currently have a crawler that is crawling one folder with 3 files of the same schema. If I want to reduce to just 1 file, would I need to run the crawler again for changes to be made into the glue job that is using the catalog table as an input? Or would it automatically update in the Glue job without having to run the crawler?

sg03
asked 4 months ago342 views
1 Answer
0
Accepted Answer

Yes, you would need to run crawler again for changes to be picked up and updated in Glue catalog tables. When Glue crawler runs, it analyzes data sources folders/files etc specified and generates tables in Glue Data Catalog based on underlying schemas/data it detects. These catalog tables then will be metadata which can be used as ETL jobs, Athena queries etc.

Crawler only analyzes data and creates/updates tables during actual crawler runs. Any changes made to underlying data sources will not be reflected in Glue data catalog unless crawler re-runs. Thus, in your case re-run the crawler so existing Glue job can see new updated file/schema structures

AWS
EXPERT
Ben Lee
answered 4 months ago
profile picture
EXPERT
reviewed 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions