AWS Glue Catalog Data Tables

0

Hi all, I had a quick question regarding AWS Glue Catalog Tables. I currently have a crawler that is crawling one folder with 3 files of the same schema. If I want to reduce to just 1 file, would I need to run the crawler again for changes to be made into the glue job that is using the catalog table as an input? Or would it automatically update in the Glue job without having to run the crawler?

sg03
feita há 5 meses351 visualizações
1 Resposta
0
Resposta aceita

Yes, you would need to run crawler again for changes to be picked up and updated in Glue catalog tables. When Glue crawler runs, it analyzes data sources folders/files etc specified and generates tables in Glue Data Catalog based on underlying schemas/data it detects. These catalog tables then will be metadata which can be used as ETL jobs, Athena queries etc.

Crawler only analyzes data and creates/updates tables during actual crawler runs. Any changes made to underlying data sources will not be reflected in Glue data catalog unless crawler re-runs. Thus, in your case re-run the crawler so existing Glue job can see new updated file/schema structures

AWS
ESPECIALISTA
Ben Lee
respondido há 5 meses
profile picture
ESPECIALISTA
avaliado há 5 meses

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas