How to handle NULL with AWS Glue bookmark

0

I have a table of 30GB in size I am running an etl with an aws-glue job that copies the table to an s3 bucket. I try to bookmark using the combination of a couple of columns as the bookmark key. Some of the columns have rows with null values. An error occurred while calling o97.getDynamicFrame. Incorrect DATETIME value: 'null'. I would like to ask if there is any way to give the does column a default value.

The other alternative was moving the entire table without bookmark which I don't think is efficient.glue bookmark error

  • Where is that origin table stored?

gefragt vor einem Jahr412 Aufrufe
1 Antwort
0

Hello,

For this use case, the bookmark keys are used at data consuming side, and per documented at [1], the create_dynamic_frame.from_catalog just takes the column names for the "jobBookmarkKeys". There is no option to give a default value to a column when its value is null.

However, there is workaround to this.

If your original table is stored in an RDBMS system, then you can add a computed column, which has same value as the original column, and has a default value where the original is null.

Then in your glue job, you can use the computed column as part of bookmark keys.

Hope it helps.

=========

Reference: [1] - https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

AWS
Thi_N
beantwortet vor einem Jahr
profile pictureAWS
EXPERTE
Tasio
überprüft vor einem Jahr
  • Hello, I wanted to know if using this solution is feasible, since a condition for user-defined bookmark keys is that the field be strictly monotonically increasing, and in this case by assigning a default value to null cases then it is not possible. would meet this condition.

    Our particular case is that we have a database table with millions of records that did not have the updated_at field, the idea is to incorporate it and assign the current timestamp to the existing records and assign the current_timestamp to the new records or modifications, however, We are not sure that job bookmark accepts this column because previous records will all have the same timestamp value. I would appreciate your help with this question.

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen