Glue ETL job from Dynamo to s3 with selective data loaded to s3

0

I have a working lab setup that has a glue job extract all data from a single dynamodb table to s3 in json format. This was done with the super simple setup using the AWS Glue Dynamo connector, all through the glue visual editor. I plan to run the job daily to refresh the data. The job is setup with Glue 3.0 & Python 3. Two questions:

  1. I assume I need to purge/delete the s3 objects from the previous ETL job each night - how is this done within glue or do I need to handle it outside of glue?
  2. I would like to update that job to limit the data sent to s3 to only include dynamodb records that have a specific key/value (status <> 'completed') so that I am not loading all of the dynamo data into my target. I dont care if the job has to get ALL of the dynamo table during extract and then filters it out during the transform phase, or if there is a way to selectively get data during the extract phase even better.

If anyone could advise with a simple example I would appreciate it. While I have looked for a little bit, I havent found much quality educational material, so happy to take any suggestions there as well (other than the AWS documentation - I have that, but need some initial direction/reference/101 hands on).

Damian
질문됨 일 년 전149회 조회
답변 없음

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠