Athena - Querying S3 file (CSV with JSON objects)

0

Hi, I am trying to use Athena to query the s3 file which is in csv format and contains the json objects.

S3 CSV file:

Id,  data,  last_update
001, {"key1":"value1"}, 10-01-2024
002, {"key1":"value1" , "key2":"value2"}, 10-01-2024

I am using comma as the delimiter and " as the quotes parameter in the classifier of glue crawler.

However, its splitting the second record's json object because of the delimiter.

Athena query output

ID       Data             LastUpdate
001   {"key1":"value1"}    10-01-2024
002   {"key1":"value1"     "key2":"value2"}

Can you please advise how do we handle this? Appreciate your help.

Thanks

WQ
已提問 4 個月前檢視次數 563 次
1 個回答
0

That CSV looks broken, if you use the separator inside the field you need to escape the field.
For instance, if you make the ' the escape character:
002,'{"key1":"value1" , "key2":"value2"}',10-01-2024

You could work around with a custom grok classifier that knows when there are { at the beginning of a field it should consider the field only ends when it finds }

profile pictureAWS
專家
已回答 4 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南