how to fix issue with AWS crawler that split comma inside a single column

0

how to fix issue with AWS crawler that split comma listed values which is supposed to be one column. The data in the column is not enclosed by double quotes that is why we cannot create a custom classifier for example we have column name with values(a,b,c) and the crawler slipt then into different columns

    • Please check the detail CSV Classifer: Adding classifiers to a crawler in AWS Glue - Built-in CSV classifier - https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html#classifier-builtin-rules

    • The built-in CSV classifier parses CSV file contents to determine the schema for an AWS Glue table. This classifier checks for the following delimiters: Comma (,) Pipe (|) Tab (\t) Semicolon (;) Ctrl-A (\u0001) Ctrl-A is the Unicode control character for Start Of Heading.

    a. So in your case you can't use comma as delimiter else it will split into column. b. If your columns are not delimited by column, then you can use custom classifier with the different of the file. c. But, if you have comma delimited files and dont want to split specific value, then you are left out with writing Grok custom classifier. [] Writing custom classifiers - https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html

CYN
demandé il y a 7 mois377 vues
1 réponse
0

The crawler is making a guess, if you use a different delimiter and commas without quotes, then it cannot be sure what is right.
Assuming your csv is valid, you can correct manually the table created by the crawler, as long as you don't run it again (or you run it only to update partitions).

profile pictureAWS
EXPERT
répondu il y a 7 mois
profile picture
EXPERT
vérifié il y a un mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions