AWS Glue creating Null for Empty data

0

I am using Amazon Kinesis Firehose for converting files from JSON to Parquet leveraging Glue for Table creation.

When the data is blank the glue schema creates a NULL and the conversion at Kinesis Firehose fails with error: "lastErrorCode": "DataFormatConversion.MalformedData", "lastErrorMessage": "Data does not match the schema. For input string: "null""

is there a way to fix this?

1回答
0

Yes, there are a couple of ways to handle this issue:

Modify the Glue schema: You can modify the Glue schema to handle the NULL values explicitly. You can set the Null type to true for the columns that can have NULL values. This will ensure that the conversion to Parquet format does not fail when there are NULL values in the data.

Use a Lambda function: You can use a Lambda function in between Kinesis Firehose and Glue to modify the data before it is written to Glue. The Lambda function can check for NULL values and replace them with a default value or remove the entire record if necessary. This will ensure that only valid data is written to Glue and the conversion to Parquet format does not fail.

Use a data processing framework like Apache Spark: You can use a data processing framework like Apache Spark to handle NULL values in the data. Apache Spark provides a rich set of functions to handle NULL values and can convert data from JSON to Parquet format without any issues.

Overall, the best approach depends on your specific use case and requirements.

AWS
回答済み 1年前
  • Thank you.. Could you point me to some AWS documentation for 1 please: Modify the Glue schema: You can modify the Glue schema to handle the NULL values explicitly. You can set the Null type to true for the columns that can have NULL values. This will ensure that the conversion to Parquet format does not fail when there are NULL values in the data.

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ