AWS Glue Catalog Data Tables

0

Hi all, I had a quick question regarding AWS Glue Catalog Tables. I currently have a crawler that is crawling one folder with 3 files of the same schema. If I want to reduce to just 1 file, would I need to run the crawler again for changes to be made into the glue job that is using the catalog table as an input? Or would it automatically update in the Glue job without having to run the crawler?

sg03
已提問 5 個月前檢視次數 351 次
1 個回答
0
已接受的答案

Yes, you would need to run crawler again for changes to be picked up and updated in Glue catalog tables. When Glue crawler runs, it analyzes data sources folders/files etc specified and generates tables in Glue Data Catalog based on underlying schemas/data it detects. These catalog tables then will be metadata which can be used as ETL jobs, Athena queries etc.

Crawler only analyzes data and creates/updates tables during actual crawler runs. Any changes made to underlying data sources will not be reflected in Glue data catalog unless crawler re-runs. Thus, in your case re-run the crawler so existing Glue job can see new updated file/schema structures

AWS
專家
Ben Lee
已回答 5 個月前
profile picture
專家
已審閱 5 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南