Glue Crawler cannot classify SNAPPY compressed JSON files

0

I have a KFH application that puts compressed json files as snappy into an S3 bucket. I have also a Glue Crawler that creates schema using that bucket. However, the crawler classifies the table as UNKNOWN in case I activate snappy compression. It cannot detect the file is in JSON format indeed. According to below doc, Glue crawler provides snappy compression with JSON files but I wasn't able to achieve it. https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html#classifier-built-in

I have also thought it might be related to the file extension and tried below names but it didn't work:

Original:

|-----s3://my-bucket/my-table/day=01/file1.snappy

(1)

|-----s3://my-bucket/my-table/day=01/file1.snappy.json 

(2)

|-----s3://my-bucket/my-table/day=01/file1.json.snappy 

Thanks.

1 個回答
0

Glue crawler is unable to read it, you could create a custom JSON Classifier. After creating it, attach the custom classifier to the crawler, and this should enable the crawler to read it correctly, changing its status from Unknown to the name of your custom classifier.

Example Below:
 
    {
      "type": "constituency",
      "id": "ocd-division\/country:us\/state:ak",
      "name": "Alaska"
    }
 
Please refer to the following documentation on adding a custom classifier:
https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html
https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html#classifier-built-in
AWS
已回答 9 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南