Refresh DataSet in Glue DataBrew

0

I have an S3 backed dataset in Glue DataBrew with JSON and gzipped csv files in it. I removed the JSON files from the S3 bucket, do I need to refresh the dataset or re-add it for the changes to be picked up? How would I do so?

I couldn't find the answer in the documentation but I may have missed it.

feita há um ano436 visualizações
1 Resposta
2
Resposta aceita

Hi,

If you removed files from the S3 bucket connecting to the Glue DataBrew job, manually re-run the job and it will notice the changes. Also, you can set DataBrew to process or refresh data automatically using dynamic datasets for files in S3, where you can specify time-based, pattern-based and customizable parameters to create dynamic datasets.

Here's a link to a blog that goes into more detail of this: https://aws.amazon.com/blogs/big-data/simplify-incoming-data-ingestion-with-dynamic-parameterized-datasets-in-aws-glue-databrew/

Hope this helps!

profile pictureAWS
respondido há um ano

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas