Refresh DataSet in Glue DataBrew

0

I have an S3 backed dataset in Glue DataBrew with JSON and gzipped csv files in it. I removed the JSON files from the S3 bucket, do I need to refresh the dataset or re-add it for the changes to be picked up? How would I do so?

I couldn't find the answer in the documentation but I may have missed it.

已提問 1 年前檢視次數 436 次
1 個回答
2
已接受的答案

Hi,

If you removed files from the S3 bucket connecting to the Glue DataBrew job, manually re-run the job and it will notice the changes. Also, you can set DataBrew to process or refresh data automatically using dynamic datasets for files in S3, where you can specify time-based, pattern-based and customizable parameters to create dynamic datasets.

Here's a link to a blog that goes into more detail of this: https://aws.amazon.com/blogs/big-data/simplify-incoming-data-ingestion-with-dynamic-parameterized-datasets-in-aws-glue-databrew/

Hope this helps!

profile pictureAWS
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南