Athena - Querying S3 file (CSV with JSON objects)

0

Hi, I am trying to use Athena to query the s3 file which is in csv format and contains the json objects.

S3 CSV file:

Id,  data,  last_update
001, {"key1":"value1"}, 10-01-2024
002, {"key1":"value1" , "key2":"value2"}, 10-01-2024

I am using comma as the delimiter and " as the quotes parameter in the classifier of glue crawler.

However, its splitting the second record's json object because of the delimiter.

Athena query output

ID       Data             LastUpdate
001   {"key1":"value1"}    10-01-2024
002   {"key1":"value1"     "key2":"value2"}

Can you please advise how do we handle this? Appreciate your help.

Thanks

WQ
질문됨 4달 전563회 조회
1개 답변
0

That CSV looks broken, if you use the separator inside the field you need to escape the field.
For instance, if you make the ' the escape character:
002,'{"key1":"value1" , "key2":"value2"}',10-01-2024

You could work around with a custom grok classifier that knows when there are { at the beginning of a field it should consider the field only ends when it finds }

profile pictureAWS
전문가
답변함 4달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인