AWS Glue Costum Classifiers and Partition creation

0

My files are csv files with 3 fields using tab separation. The builtin classifier CSV creates a schema of 3 strings for the 3 attributes

a:string b:string c:string

however my last attribute c is a json string.

I would like to know if it's possible using a costum classifier to create an extra attribute endpoint that results from som pattern matching grok or regex.

Let's say , if the json string c looks like below .

{"log":{"timestampReceived":"2023-03-10 01:43:24,563 +0000","component":"ABC","subp":"0","eventStream":"DEF","endpoint":"sales"}}

I would like to grab the endpoint:sales into a new attribute of my table and use it as a partition.

it would end up like something below

a:string b:string c:string endpoint:string (partitioned)

asked a year ago211 views
1 Answer
1

Custom classifiers cannot be used to create partitioned table if the data itself is not partitioned.

Crawler creates a partitioned table based on directory structure and schema similarity at each level. Suppose there are 2 sub-directories in a target path and data in these directories are at least 70% similar, then it creates a partitioned table, otherwise it creates two table one for each sub-directory.

AWS
answered a year ago
  • Thanks Sachin , do you have any workflow recommendation besides repartition of data directories ?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions