Catalog Dataframe - AWS Glue

0

hello, I am creating a dataframe consuming from a Glue Catalog table, this table has fields of type bigint, which can be null. It turns out that when this information is null, the dataframe ignores these fields, which impacts the rest of the code, as I am using this table to merge into the destination. Do you have any solution for this problem? Below is a snippet of the code:

IncrementalInputDyF = glueContext.create_dynamic_frame.from_catalog( database = "litio_sqlserver", table_name = "crawler_operation", transformation_ctx = "IncrementalInputDyF") IncrementalInputDF = IncrementalInputDyF.toDF()

已提問 2 個月前檢視次數 127 次
1 個回答
1
已接受的答案

The issue is that you are not reading a DataFrame, you are reading a DynamicFrame that is dynamically inferring the schema (and thus omitting that column) and then converting it to DataFrame. You need a DataFrame using the corresponding API call or using the Spark API (as long as you are not using LakeFormation permissions).

profile pictureAWS
專家
已回答 2 個月前
  • Ah ok, I understand, with this solution, would I still be able to use the job bookmark option? In context, after creating this dynamic frame, the code merges with an Iceberg-type table, updating the records, so, so that it doesn't process if there are no new lines in the source, I use the Job Bookmark.

  • No, in that case you lose bookmarks. In your case, what I would do is add the missing columns myself by comparing the DynamicFrame schema (with bookmarks) with an empty DataFrame read from the table (e.g. with some predicate that doesn't match any data) and then add any columns missing. However, if you are writing to Iceberg, it shouldn't matter if empty columns are not present, especially if the table already exists with those columns

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南