Catalog Dataframe - AWS Glue

hello, I am creating a dataframe consuming from a Glue Catalog table, this table has fields of type bigint, which can be null. It turns out that when this information is null, the dataframe ignores these fields, which impacts the rest of the code, as I am using this table to merge into the destination. Do you have any solution for this problem? Below is a snippet of the code:

IncrementalInputDyF = glueContext.create_dynamic_frame.from_catalog( database = "litio_sqlserver", table_name = "crawler_operation", transformation_ctx = "IncrementalInputDyF") IncrementalInputDF = IncrementalInputDyF.toDF()

주제

분석

태그

AWS Glue

언어

English

Braulio B

질문됨 2달 전126회 조회

1개 답변

최신
최다 투표
가장 많은 댓글

수락된 답변

The issue is that you are not reading a DataFrame, you are reading a DynamicFrame that is dynamically inferring the schema (and thus omitting that column) and then converting it to DataFrame. You need a DataFrame using the corresponding API call or using the Spark API (as long as you are not using LakeFormation permissions).

전문가

Gonzalo Herreros

답변함 2달 전

Braulio B
2달 전
Ah ok, I understand, with this solution, would I still be able to use the job bookmark option? In context, after creating this dynamic frame, the code merges with an Iceberg-type table, updating the records, so, so that it doesn't process if there are no new lines in the source, I use the Job Bookmark.
Gonzalo Herreros 전문가
2달 전
No, in that case you lose bookmarks. In your case, what I would do is add the missing columns myself by comparing the DynamicFrame schema (with bookmarks) with an empty DataFrame read from the table (e.g. with some predicate that doesn't match any data) and then add any columns missing. However, if you are writing to Iceberg, it shouldn't matter if empty columns are not present, especially if the table already exists with those columns

Catalog Dataframe - AWS Glue

관련 콘텐츠