Athena - Querying S3 file (CSV with JSON objects)

0

Hi, I am trying to use Athena to query the s3 file which is in csv format and contains the json objects.

S3 CSV file:

Id,  data,  last_update
001, {"key1":"value1"}, 10-01-2024
002, {"key1":"value1" , "key2":"value2"}, 10-01-2024

I am using comma as the delimiter and " as the quotes parameter in the classifier of glue crawler.

However, its splitting the second record's json object because of the delimiter.

Athena query output

ID       Data             LastUpdate
001   {"key1":"value1"}    10-01-2024
002   {"key1":"value1"     "key2":"value2"}

Can you please advise how do we handle this? Appreciate your help.

Thanks

WQ
asked 4 months ago554 views
1 Answer
0

That CSV looks broken, if you use the separator inside the field you need to escape the field.
For instance, if you make the ' the escape character:
002,'{"key1":"value1" , "key2":"value2"}',10-01-2024

You could work around with a custom grok classifier that knows when there are { at the beginning of a field it should consider the field only ends when it finds }

profile pictureAWS
EXPERT
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions