Refresh DataSet in Glue DataBrew

0

I have an S3 backed dataset in Glue DataBrew with JSON and gzipped csv files in it. I removed the JSON files from the S3 bucket, do I need to refresh the dataset or re-add it for the changes to be picked up? How would I do so?

I couldn't find the answer in the documentation but I may have missed it.

質問済み 1年前437ビュー
1回答
2
承認された回答

Hi,

If you removed files from the S3 bucket connecting to the Glue DataBrew job, manually re-run the job and it will notice the changes. Also, you can set DataBrew to process or refresh data automatically using dynamic datasets for files in S3, where you can specify time-based, pattern-based and customizable parameters to create dynamic datasets.

Here's a link to a blog that goes into more detail of this: https://aws.amazon.com/blogs/big-data/simplify-incoming-data-ingestion-with-dynamic-parameterized-datasets-in-aws-glue-databrew/

Hope this helps!

profile pictureAWS
回答済み 1年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ