Athena - Querying S3 file (CSV with JSON objects)

0

Hi, I am trying to use Athena to query the s3 file which is in csv format and contains the json objects.

S3 CSV file:

Id,  data,  last_update
001, {"key1":"value1"}, 10-01-2024
002, {"key1":"value1" , "key2":"value2"}, 10-01-2024

I am using comma as the delimiter and " as the quotes parameter in the classifier of glue crawler.

However, its splitting the second record's json object because of the delimiter.

Athena query output

ID       Data             LastUpdate
001   {"key1":"value1"}    10-01-2024
002   {"key1":"value1"     "key2":"value2"}

Can you please advise how do we handle this? Appreciate your help.

Thanks

WQ
preguntada hace 4 meses563 visualizaciones
1 Respuesta
0

That CSV looks broken, if you use the separator inside the field you need to escape the field.
For instance, if you make the ' the escape character:
002,'{"key1":"value1" , "key2":"value2"}',10-01-2024

You could work around with a custom grok classifier that knows when there are { at the beginning of a field it should consider the field only ends when it finds }

profile pictureAWS
EXPERTO
respondido hace 4 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas