AWS Glue creating Null for Empty data

0

I am using Amazon Kinesis Firehose for converting files from JSON to Parquet leveraging Glue for Table creation.

When the data is blank the glue schema creates a NULL and the conversion at Kinesis Firehose fails with error: "lastErrorCode": "DataFormatConversion.MalformedData", "lastErrorMessage": "Data does not match the schema. For input string: "null""

is there a way to fix this?

1 Answer
0

Yes, there are a couple of ways to handle this issue:

Modify the Glue schema: You can modify the Glue schema to handle the NULL values explicitly. You can set the Null type to true for the columns that can have NULL values. This will ensure that the conversion to Parquet format does not fail when there are NULL values in the data.

Use a Lambda function: You can use a Lambda function in between Kinesis Firehose and Glue to modify the data before it is written to Glue. The Lambda function can check for NULL values and replace them with a default value or remove the entire record if necessary. This will ensure that only valid data is written to Glue and the conversion to Parquet format does not fail.

Use a data processing framework like Apache Spark: You can use a data processing framework like Apache Spark to handle NULL values in the data. Apache Spark provides a rich set of functions to handle NULL values and can convert data from JSON to Parquet format without any issues.

Overall, the best approach depends on your specific use case and requirements.

AWS
answered a year ago
  • Thank you.. Could you point me to some AWS documentation for 1 please: Modify the Glue schema: You can modify the Glue schema to handle the NULL values explicitly. You can set the Null type to true for the columns that can have NULL values. This will ensure that the conversion to Parquet format does not fail when there are NULL values in the data.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions