How to handle NULL with AWS Glue bookmark

0

I have a table of 30GB in size I am running an etl with an aws-glue job that copies the table to an s3 bucket. I try to bookmark using the combination of a couple of columns as the bookmark key. Some of the columns have rows with null values. An error occurred while calling o97.getDynamicFrame. Incorrect DATETIME value: 'null'. I would like to ask if there is any way to give the does column a default value.

The other alternative was moving the entire table without bookmark which I don't think is efficient.glue bookmark error

  • Where is that origin table stored?

demandé il y a un an408 vues
1 réponse
0

Hello,

For this use case, the bookmark keys are used at data consuming side, and per documented at [1], the create_dynamic_frame.from_catalog just takes the column names for the "jobBookmarkKeys". There is no option to give a default value to a column when its value is null.

However, there is workaround to this.

If your original table is stored in an RDBMS system, then you can add a computed column, which has same value as the original column, and has a default value where the original is null.

Then in your glue job, you can use the computed column as part of bookmark keys.

Hope it helps.

=========

Reference: [1] - https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

AWS
Thi_N
répondu il y a un an
profile pictureAWS
EXPERT
Tasio
vérifié il y a un an
  • Hello, I wanted to know if using this solution is feasible, since a condition for user-defined bookmark keys is that the field be strictly monotonically increasing, and in this case by assigning a default value to null cases then it is not possible. would meet this condition.

    Our particular case is that we have a database table with millions of records that did not have the updated_at field, the idea is to incorporate it and assign the current timestamp to the existing records and assign the current_timestamp to the new records or modifications, however, We are not sure that job bookmark accepts this column because previous records will all have the same timestamp value. I would appreciate your help with this question.

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions