Glue Crawler cannot classify SNAPPY compressed JSON files

0

I have a KFH application that puts compressed json files as snappy into an S3 bucket. I have also a Glue Crawler that creates schema using that bucket. However, the crawler classifies the table as UNKNOWN in case I activate snappy compression. It cannot detect the file is in JSON format indeed. According to below doc, Glue crawler provides snappy compression with JSON files but I wasn't able to achieve it. https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html#classifier-built-in

I have also thought it might be related to the file extension and tried below names but it didn't work:

Original:

|-----s3://my-bucket/my-table/day=01/file1.snappy

(1)

|-----s3://my-bucket/my-table/day=01/file1.snappy.json 

(2)

|-----s3://my-bucket/my-table/day=01/file1.json.snappy 

Thanks.

1 回答
0

Glue crawler is unable to read it, you could create a custom JSON Classifier. After creating it, attach the custom classifier to the crawler, and this should enable the crawler to read it correctly, changing its status from Unknown to the name of your custom classifier.

Example Below:
 
    {
      "type": "constituency",
      "id": "ocd-division\/country:us\/state:ak",
      "name": "Alaska"
    }
 
Please refer to the following documentation on adding a custom classifier:
https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html
https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html#classifier-built-in
AWS
已回答 8 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则