Catalog Dataframe - AWS Glue

0

hello, I am creating a dataframe consuming from a Glue Catalog table, this table has fields of type bigint, which can be null. It turns out that when this information is null, the dataframe ignores these fields, which impacts the rest of the code, as I am using this table to merge into the destination. Do you have any solution for this problem? Below is a snippet of the code:

IncrementalInputDyF = glueContext.create_dynamic_frame.from_catalog( database = "litio_sqlserver", table_name = "crawler_operation", transformation_ctx = "IncrementalInputDyF") IncrementalInputDF = IncrementalInputDyF.toDF()

質問済み 2ヶ月前124ビュー
1回答
1
承認された回答

The issue is that you are not reading a DataFrame, you are reading a DynamicFrame that is dynamically inferring the schema (and thus omitting that column) and then converting it to DataFrame. You need a DataFrame using the corresponding API call or using the Spark API (as long as you are not using LakeFormation permissions).

profile pictureAWS
エキスパート
回答済み 2ヶ月前
  • Ah ok, I understand, with this solution, would I still be able to use the job bookmark option? In context, after creating this dynamic frame, the code merges with an Iceberg-type table, updating the records, so, so that it doesn't process if there are no new lines in the source, I use the Job Bookmark.

  • No, in that case you lose bookmarks. In your case, what I would do is add the missing columns myself by comparing the DynamicFrame schema (with bookmarks) with an empty DataFrame read from the table (e.g. with some predicate that doesn't match any data) and then add any columns missing. However, if you are writing to Iceberg, it shouldn't matter if empty columns are not present, especially if the table already exists with those columns

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ