AWS Glue creating Null for Empty data

0

I am using Amazon Kinesis Firehose for converting files from JSON to Parquet leveraging Glue for Table creation.

When the data is blank the glue schema creates a NULL and the conversion at Kinesis Firehose fails with error: "lastErrorCode": "DataFormatConversion.MalformedData", "lastErrorMessage": "Data does not match the schema. For input string: "null""

is there a way to fix this?

已提问 1 年前2113 查看次数
1 回答
0

Yes, there are a couple of ways to handle this issue:

Modify the Glue schema: You can modify the Glue schema to handle the NULL values explicitly. You can set the Null type to true for the columns that can have NULL values. This will ensure that the conversion to Parquet format does not fail when there are NULL values in the data.

Use a Lambda function: You can use a Lambda function in between Kinesis Firehose and Glue to modify the data before it is written to Glue. The Lambda function can check for NULL values and replace them with a default value or remove the entire record if necessary. This will ensure that only valid data is written to Glue and the conversion to Parquet format does not fail.

Use a data processing framework like Apache Spark: You can use a data processing framework like Apache Spark to handle NULL values in the data. Apache Spark provides a rich set of functions to handle NULL values and can convert data from JSON to Parquet format without any issues.

Overall, the best approach depends on your specific use case and requirements.

AWS
已回答 1 年前
  • Thank you.. Could you point me to some AWS documentation for 1 please: Modify the Glue schema: You can modify the Glue schema to handle the NULL values explicitly. You can set the Null type to true for the columns that can have NULL values. This will ensure that the conversion to Parquet format does not fail when there are NULL values in the data.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则