Catalog Dataframe - AWS Glue

0

hello, I am creating a dataframe consuming from a Glue Catalog table, this table has fields of type bigint, which can be null. It turns out that when this information is null, the dataframe ignores these fields, which impacts the rest of the code, as I am using this table to merge into the destination. Do you have any solution for this problem? Below is a snippet of the code:

IncrementalInputDyF = glueContext.create_dynamic_frame.from_catalog( database = "litio_sqlserver", table_name = "crawler_operation", transformation_ctx = "IncrementalInputDyF") IncrementalInputDF = IncrementalInputDyF.toDF()

질문됨 2달 전126회 조회
1개 답변
1
수락된 답변

The issue is that you are not reading a DataFrame, you are reading a DynamicFrame that is dynamically inferring the schema (and thus omitting that column) and then converting it to DataFrame. You need a DataFrame using the corresponding API call or using the Spark API (as long as you are not using LakeFormation permissions).

profile pictureAWS
전문가
답변함 2달 전
  • Ah ok, I understand, with this solution, would I still be able to use the job bookmark option? In context, after creating this dynamic frame, the code merges with an Iceberg-type table, updating the records, so, so that it doesn't process if there are no new lines in the source, I use the Job Bookmark.

  • No, in that case you lose bookmarks. In your case, what I would do is add the missing columns myself by comparing the DynamicFrame schema (with bookmarks) with an empty DataFrame read from the table (e.g. with some predicate that doesn't match any data) and then add any columns missing. However, if you are writing to Iceberg, it shouldn't matter if empty columns are not present, especially if the table already exists with those columns

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠