[Pandas] [Glue] ResolveChoice cannot cast to JSON

0

Hi, I'm trying to cast a dataframe that contains JSON strong to a json format to write it to DB. I'm using an example from here https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-transforms-ResolveChoice.html

But the following cast to JSON is not working

df_events.resolveChoice(specs = [('detail','cast:json')])

and I get the following error

IllegalArgumentException: 'Invalid type name json'

Any suggestion how to cast to JSON? it's required to write a Data Frame to PostgresSQL

Michael
asked 2 years ago559 views
1 Answer
0

json is not a data type supported by glue. You would have to choose between one of these - BOOLEAN | DATE | DECIMAL | DOUBLE | LONG | STRING | BINARY. In your case when you have a JSON, you need to define those as a String column and cast it into a String.

You should be able to query this data with Athena. When you query this table you can use the JSON functions to query the JSON columns, for example:

SELECT json_extract_scalar(attributes, '$.tag1') ...

glue : https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-common.html

athena : https://docs.aws.amazon.com/athena/latest/ug/data-types.html

profile pictureAWS
answered 2 years ago
  • What to do in a situation when you cannot change DB schema that has JSON column? Non of the suggested type would work as the Writing action would thrown an exception. This seems a massive oversight for Glue

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions