How to handle NULL with AWS Glue bookmark

0

I have a table of 30GB in size I am running an etl with an aws-glue job that copies the table to an s3 bucket. I try to bookmark using the combination of a couple of columns as the bookmark key. Some of the columns have rows with null values. An error occurred while calling o97.getDynamicFrame. Incorrect DATETIME value: 'null'. I would like to ask if there is any way to give the does column a default value.

The other alternative was moving the entire table without bookmark which I don't think is efficient.glue bookmark error

  • Where is that origin table stored?

已提问 1 年前408 查看次数
1 回答
0

Hello,

For this use case, the bookmark keys are used at data consuming side, and per documented at [1], the create_dynamic_frame.from_catalog just takes the column names for the "jobBookmarkKeys". There is no option to give a default value to a column when its value is null.

However, there is workaround to this.

If your original table is stored in an RDBMS system, then you can add a computed column, which has same value as the original column, and has a default value where the original is null.

Then in your glue job, you can use the computed column as part of bookmark keys.

Hope it helps.

=========

Reference: [1] - https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

AWS
Thi_N
已回答 1 年前
profile pictureAWS
专家
Tasio
已审核 1 年前
  • Hello, I wanted to know if using this solution is feasible, since a condition for user-defined bookmark keys is that the field be strictly monotonically increasing, and in this case by assigning a default value to null cases then it is not possible. would meet this condition.

    Our particular case is that we have a database table with millions of records that did not have the updated_at field, the idea is to incorporate it and assign the current timestamp to the existing records and assign the current_timestamp to the new records or modifications, however, We are not sure that job bookmark accepts this column because previous records will all have the same timestamp value. I would appreciate your help with this question.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则