How to handle NULL with AWS Glue bookmark

0

I have a table of 30GB in size I am running an etl with an aws-glue job that copies the table to an s3 bucket. I try to bookmark using the combination of a couple of columns as the bookmark key. Some of the columns have rows with null values. An error occurred while calling o97.getDynamicFrame. Incorrect DATETIME value: 'null'. I would like to ask if there is any way to give the does column a default value.

The other alternative was moving the entire table without bookmark which I don't think is efficient.glue bookmark error

  • Where is that origin table stored?

已提問 1 年前檢視次數 404 次
1 個回答
0

Hello,

For this use case, the bookmark keys are used at data consuming side, and per documented at [1], the create_dynamic_frame.from_catalog just takes the column names for the "jobBookmarkKeys". There is no option to give a default value to a column when its value is null.

However, there is workaround to this.

If your original table is stored in an RDBMS system, then you can add a computed column, which has same value as the original column, and has a default value where the original is null.

Then in your glue job, you can use the computed column as part of bookmark keys.

Hope it helps.

=========

Reference: [1] - https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

AWS
Thi_N
已回答 1 年前
profile pictureAWS
專家
Tasio
已審閱 1 年前
  • Hello, I wanted to know if using this solution is feasible, since a condition for user-defined bookmark keys is that the field be strictly monotonically increasing, and in this case by assigning a default value to null cases then it is not possible. would meet this condition.

    Our particular case is that we have a database table with millions of records that did not have the updated_at field, the idea is to incorporate it and assign the current timestamp to the existing records and assign the current_timestamp to the new records or modifications, however, We are not sure that job bookmark accepts this column because previous records will all have the same timestamp value. I would appreciate your help with this question.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南