Athena - Querying S3 file (CSV with JSON objects)

0

Hi, I am trying to use Athena to query the s3 file which is in csv format and contains the json objects.

S3 CSV file:

Id,  data,  last_update
001, {"key1":"value1"}, 10-01-2024
002, {"key1":"value1" , "key2":"value2"}, 10-01-2024

I am using comma as the delimiter and " as the quotes parameter in the classifier of glue crawler.

However, its splitting the second record's json object because of the delimiter.

Athena query output

ID       Data             LastUpdate
001   {"key1":"value1"}    10-01-2024
002   {"key1":"value1"     "key2":"value2"}

Can you please advise how do we handle this? Appreciate your help.

Thanks

WQ
gefragt vor 4 Monaten563 Aufrufe
1 Antwort
0

That CSV looks broken, if you use the separator inside the field you need to escape the field.
For instance, if you make the ' the escape character:
002,'{"key1":"value1" , "key2":"value2"}',10-01-2024

You could work around with a custom grok classifier that knows when there are { at the beginning of a field it should consider the field only ends when it finds }

profile pictureAWS
EXPERTE
beantwortet vor 4 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen