how to fix issue with AWS crawler that split comma inside a single column

0

how to fix issue with AWS crawler that split comma listed values which is supposed to be one column. The data in the column is not enclosed by double quotes that is why we cannot create a custom classifier for example we have column name with values(a,b,c) and the crawler slipt then into different columns

    • Please check the detail CSV Classifer: Adding classifiers to a crawler in AWS Glue - Built-in CSV classifier - https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html#classifier-builtin-rules

    • The built-in CSV classifier parses CSV file contents to determine the schema for an AWS Glue table. This classifier checks for the following delimiters: Comma (,) Pipe (|) Tab (\t) Semicolon (;) Ctrl-A (\u0001) Ctrl-A is the Unicode control character for Start Of Heading.

    a. So in your case you can't use comma as delimiter else it will split into column. b. If your columns are not delimited by column, then you can use custom classifier with the different of the file. c. But, if you have comma delimited files and dont want to split specific value, then you are left out with writing Grok custom classifier. [] Writing custom classifiers - https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html

CYN
asked 6 months ago355 views
1 Answer
0

The crawler is making a guess, if you use a different delimiter and commas without quotes, then it cannot be sure what is right.
Assuming your csv is valid, you can correct manually the table created by the crawler, as long as you don't run it again (or you run it only to update partitions).

profile pictureAWS
EXPERT
answered 6 months ago
profile picture
EXPERT
reviewed 23 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions