AWS Glue Loading into Redshift with empty values

0

I have a AWS Glue pyspark ETL job and I need to load a redshift table from the dataframe. Several date and string columns have NULL values in the DataFrame in a subset of rows. I am getting this error when trying to run the glueContext.write_dynamic_frame.from_catalog command in the glue pyspark script...

IllegalArgumentException: Don't know how to save NullType to REDSHIFT

I am not sure how to resolve as I cannot "fill" these missing values. They are simply optional dates and values that do not exist. I can find no documentation or information in AWS about how to resolve this issue.

質問済み 2年前1950ビュー
1回答
0

This error is to be expected when there are null values in the data since it seems like Redshift does not allow to write values with NullType. Some suggestions to avoid this error are -

  1. Remove the rows containing null values using Spark SQL by converting glue dynamic frame to spark dataframe and then deleting rows with null values.
  2. Use DropNullFields option in Glue - https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-transforms-DropNullFields.html
AWS
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ