Refresh DataSet in Glue DataBrew

0

I have an S3 backed dataset in Glue DataBrew with JSON and gzipped csv files in it. I removed the JSON files from the S3 bucket, do I need to refresh the dataset or re-add it for the changes to be picked up? How would I do so?

I couldn't find the answer in the documentation but I may have missed it.

已提问 1 年前437 查看次数
1 回答
2
已接受的回答

Hi,

If you removed files from the S3 bucket connecting to the Glue DataBrew job, manually re-run the job and it will notice the changes. Also, you can set DataBrew to process or refresh data automatically using dynamic datasets for files in S3, where you can specify time-based, pattern-based and customizable parameters to create dynamic datasets.

Here's a link to a blog that goes into more detail of this: https://aws.amazon.com/blogs/big-data/simplify-incoming-data-ingestion-with-dynamic-parameterized-datasets-in-aws-glue-databrew/

Hope this helps!

profile pictureAWS
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则

相关内容