Glue ETL job from Dynamo to s3 with selective data loaded to s3

I have a working lab setup that has a glue job extract all data from a single dynamodb table to s3 in json format. This was done with the super simple setup using the AWS Glue Dynamo connector, all through the glue visual editor. I plan to run the job daily to refresh the data. The job is setup with Glue 3.0 & Python 3. Two questions:

I assume I need to purge/delete the s3 objects from the previous ETL job each night - how is this done within glue or do I need to handle it outside of glue?
I would like to update that job to limit the data sent to s3 to only include dynamodb records that have a specific key/value (status <> 'completed') so that I am not loading all of the dynamo data into my target. I dont care if the job has to get ALL of the dynamo table during extract and then filters it out during the transform phase, or if there is a way to selectively get data during the extract phase even better.

If anyone could advise with a simple example I would appreciate it. While I have looked for a little bit, I havent found much quality educational material, so happy to take any suggestions there as well (other than the AWS documentation - I have that, but need some initial direction/reference/101 hands on).

トピック

分析データベース

タグ

AWS Glue Amazon DynamoDB

言語

English

Damian

質問済み 1年前149ビュー

回答なし

新しい順
投票が多い順
コメントが多い順

関連するコンテンツ

S3のTLS1.3対応状況について
承認された回答
rePost-User-9807328
質問済み 1年前
how to create an instance VM server and how to make a plan for 3 years?
kou
質問済み 2ヶ月前
How to add a policy to the permission policy of IAM roles that allows only users verified through the Cognito user pool to download files during S3 download.如何追加一个策略，使得该策略可以在S3下载时，仅允许通过Cognito用户池验证的用户
承認された回答
Dgk
質問済み 3ヶ月前
RDS for MySQL 8.0.35でPlugin mysql_native_password reported: ''mysql_native_password' is deprecated and will be removed in a future release. Please use caching_sha2_password instead'のエラーが継続して発生する
parma
質問済み 3ヶ月前
AWS Glue ジョブが「Exit status:-100.Diagnostics: Container released on a *lost* node」で失敗するのはなぜですか？
AWS公式更新しました 3年前
AWS Glue、Amazon EMR、または Amazon Athena から Amazon S3 リクエスタ支払いバケットにアクセスするにはどうすればよいですか?
AWS公式更新しました 2年前
AWS Glue で発生する「Unable to infer schema」例外を解決するにはどうすればよいですか?
AWS公式更新しました 3年前
AWS Glue 2.0 ETL ジョブで外部 Python ライブラリを使用するにはどうすればよいですか?
AWS公式更新しました 2年前
アベイラビリティーゾーン (AZ) の移行&インスタンスのアップグレードガイド
エキスパート
Sumikawa_M
公開済み 2ヶ月前