AWS Glue Catalog Data Tables

0

Hi all, I had a quick question regarding AWS Glue Catalog Tables. I currently have a crawler that is crawling one folder with 3 files of the same schema. If I want to reduce to just 1 file, would I need to run the crawler again for changes to be made into the glue job that is using the catalog table as an input? Or would it automatically update in the Glue job without having to run the crawler?

sg03
posta 5 mesi fa351 visualizzazioni
1 Risposta
0
Risposta accettata

Yes, you would need to run crawler again for changes to be picked up and updated in Glue catalog tables. When Glue crawler runs, it analyzes data sources folders/files etc specified and generates tables in Glue Data Catalog based on underlying schemas/data it detects. These catalog tables then will be metadata which can be used as ETL jobs, Athena queries etc.

Crawler only analyzes data and creates/updates tables during actual crawler runs. Any changes made to underlying data sources will not be reflected in Glue data catalog unless crawler re-runs. Thus, in your case re-run the crawler so existing Glue job can see new updated file/schema structures

AWS
ESPERTO
Ben Lee
con risposta 5 mesi fa
profile picture
ESPERTO
verificato 5 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande