Glue crawler json parsing

0

I am currently trying to parse a nested json file that looks like this:

{"A":1,"B":{"B1":"test1","B2":"test1"}}, {"A":2,"B":{"B1":"test2","B2":"test2"}}

I specified classifier as: $.B

This results in the following table:

CREATE EXTERNAL TABLE testing( B1 string COMMENT 'from deserializer', B2 string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'paths'='B1,B2') STORED AS INPUTFORMAT ...

But when I query the data I get 2 records with null values for all fields. Can someone please explain what is going on? Why isn't the data showing up when the structure for the table seems to be ok.

질문됨 2년 전3980회 조회
1개 답변
0

Hello,

Looks like the issue is with the property jsonPath which gets added by the AWS glue crawler to the table properties when you attach a custom JSON classifier. When you query this table using AWS Athena with the JSON serde org.openx.data.jsonserde.JsonSerDe , it is not able to understand this property and hence it might not be able to parse the JSON data resulting in empty rows.

I would suggest you to check on this blog for querying nested JSON data using AWS Athena.

However, a Glue ETL job is able to read the table and display the output successfully !!

Ref:

AWS Athena JSON serde reference

AWS
지원 엔지니어
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠