AWS Glue creating Null for Empty data

0

I am using Amazon Kinesis Firehose for converting files from JSON to Parquet leveraging Glue for Table creation.

When the data is blank the glue schema creates a NULL and the conversion at Kinesis Firehose fails with error: "lastErrorCode": "DataFormatConversion.MalformedData", "lastErrorMessage": "Data does not match the schema. For input string: "null""

is there a way to fix this?

1개 답변
0

Yes, there are a couple of ways to handle this issue:

Modify the Glue schema: You can modify the Glue schema to handle the NULL values explicitly. You can set the Null type to true for the columns that can have NULL values. This will ensure that the conversion to Parquet format does not fail when there are NULL values in the data.

Use a Lambda function: You can use a Lambda function in between Kinesis Firehose and Glue to modify the data before it is written to Glue. The Lambda function can check for NULL values and replace them with a default value or remove the entire record if necessary. This will ensure that only valid data is written to Glue and the conversion to Parquet format does not fail.

Use a data processing framework like Apache Spark: You can use a data processing framework like Apache Spark to handle NULL values in the data. Apache Spark provides a rich set of functions to handle NULL values and can convert data from JSON to Parquet format without any issues.

Overall, the best approach depends on your specific use case and requirements.

AWS
답변함 일 년 전
  • Thank you.. Could you point me to some AWS documentation for 1 please: Modify the Glue schema: You can modify the Glue schema to handle the NULL values explicitly. You can set the Null type to true for the columns that can have NULL values. This will ensure that the conversion to Parquet format does not fail when there are NULL values in the data.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠