AWS Glue Catalog Data Tables

0

Hi all, I had a quick question regarding AWS Glue Catalog Tables. I currently have a crawler that is crawling one folder with 3 files of the same schema. If I want to reduce to just 1 file, would I need to run the crawler again for changes to be made into the glue job that is using the catalog table as an input? Or would it automatically update in the Glue job without having to run the crawler?

sg03
質問済み 5ヶ月前350ビュー
1回答
0
承認された回答

Yes, you would need to run crawler again for changes to be picked up and updated in Glue catalog tables. When Glue crawler runs, it analyzes data sources folders/files etc specified and generates tables in Glue Data Catalog based on underlying schemas/data it detects. These catalog tables then will be metadata which can be used as ETL jobs, Athena queries etc.

Crawler only analyzes data and creates/updates tables during actual crawler runs. Any changes made to underlying data sources will not be reflected in Glue data catalog unless crawler re-runs. Thus, in your case re-run the crawler so existing Glue job can see new updated file/schema structures

AWS
エキスパート
Ben Lee
回答済み 5ヶ月前
profile picture
エキスパート
レビュー済み 5ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ