Catalog Dataframe - AWS Glue

0

hello, I am creating a dataframe consuming from a Glue Catalog table, this table has fields of type bigint, which can be null. It turns out that when this information is null, the dataframe ignores these fields, which impacts the rest of the code, as I am using this table to merge into the destination. Do you have any solution for this problem? Below is a snippet of the code:

IncrementalInputDyF = glueContext.create_dynamic_frame.from_catalog( database = "litio_sqlserver", table_name = "crawler_operation", transformation_ctx = "IncrementalInputDyF") IncrementalInputDF = IncrementalInputDyF.toDF()

asked 2 months ago108 views
1 Answer
1
Accepted Answer

The issue is that you are not reading a DataFrame, you are reading a DynamicFrame that is dynamically inferring the schema (and thus omitting that column) and then converting it to DataFrame. You need a DataFrame using the corresponding API call or using the Spark API (as long as you are not using LakeFormation permissions).

profile pictureAWS
EXPERT
answered 2 months ago
  • Ah ok, I understand, with this solution, would I still be able to use the job bookmark option? In context, after creating this dynamic frame, the code merges with an Iceberg-type table, updating the records, so, so that it doesn't process if there are no new lines in the source, I use the Job Bookmark.

  • No, in that case you lose bookmarks. In your case, what I would do is add the missing columns myself by comparing the DynamicFrame schema (with bookmarks) with an empty DataFrame read from the table (e.g. with some predicate that doesn't match any data) and then add any columns missing. However, if you are writing to Iceberg, it shouldn't matter if empty columns are not present, especially if the table already exists with those columns

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions