how to fix issue with AWS crawler that split comma inside a single column

0

how to fix issue with AWS crawler that split comma listed values which is supposed to be one column. The data in the column is not enclosed by double quotes that is why we cannot create a custom classifier for example we have column name with values(a,b,c) and the crawler slipt then into different columns

    • Please check the detail CSV Classifer: Adding classifiers to a crawler in AWS Glue - Built-in CSV classifier - https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html#classifier-builtin-rules

    • The built-in CSV classifier parses CSV file contents to determine the schema for an AWS Glue table. This classifier checks for the following delimiters: Comma (,) Pipe (|) Tab (\t) Semicolon (;) Ctrl-A (\u0001) Ctrl-A is the Unicode control character for Start Of Heading.

    a. So in your case you can't use comma as delimiter else it will split into column. b. If your columns are not delimited by column, then you can use custom classifier with the different of the file. c. But, if you have comma delimited files and dont want to split specific value, then you are left out with writing Grok custom classifier. [] Writing custom classifiers - https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html

CYN
gefragt vor 7 Monaten377 Aufrufe
1 Antwort
0

The crawler is making a guess, if you use a different delimiter and commas without quotes, then it cannot be sure what is right.
Assuming your csv is valid, you can correct manually the table created by the crawler, as long as you don't run it again (or you run it only to update partitions).

profile pictureAWS
EXPERTE
beantwortet vor 7 Monaten
profile picture
EXPERTE
überprüft vor einem Monat

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen