AWS Glue Costum Classifiers and Partition creation

0

My files are csv files with 3 fields using tab separation. The builtin classifier CSV creates a schema of 3 strings for the 3 attributes

a:string b:string c:string

however my last attribute c is a json string.

I would like to know if it's possible using a costum classifier to create an extra attribute endpoint that results from som pattern matching grok or regex.

Let's say , if the json string c looks like below .

{"log":{"timestampReceived":"2023-03-10 01:43:24,563 +0000","component":"ABC","subp":"0","eventStream":"DEF","endpoint":"sales"}}

I would like to grab the endpoint:sales into a new attribute of my table and use it as a partition.

it would end up like something below

a:string b:string c:string endpoint:string (partitioned)

質問済み 1年前223ビュー
1回答
1

Custom classifiers cannot be used to create partitioned table if the data itself is not partitioned.

Crawler creates a partitioned table based on directory structure and schema similarity at each level. Suppose there are 2 sub-directories in a target path and data in these directories are at least 70% similar, then it creates a partitioned table, otherwise it creates two table one for each sub-directory.

AWS
回答済み 1年前
  • Thanks Sachin , do you have any workflow recommendation besides repartition of data directories ?

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ