how to fix issue with AWS crawler that split comma inside a single column

0

how to fix issue with AWS crawler that split comma listed values which is supposed to be one column. The data in the column is not enclosed by double quotes that is why we cannot create a custom classifier for example we have column name with values(a,b,c) and the crawler slipt then into different columns

    • Please check the detail CSV Classifer: Adding classifiers to a crawler in AWS Glue - Built-in CSV classifier - https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html#classifier-builtin-rules

    • The built-in CSV classifier parses CSV file contents to determine the schema for an AWS Glue table. This classifier checks for the following delimiters: Comma (,) Pipe (|) Tab (\t) Semicolon (;) Ctrl-A (\u0001) Ctrl-A is the Unicode control character for Start Of Heading.

    a. So in your case you can't use comma as delimiter else it will split into column. b. If your columns are not delimited by column, then you can use custom classifier with the different of the file. c. But, if you have comma delimited files and dont want to split specific value, then you are left out with writing Grok custom classifier. [] Writing custom classifiers - https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html

CYN
feita há 7 meses377 visualizações
1 Resposta
0

The crawler is making a guess, if you use a different delimiter and commas without quotes, then it cannot be sure what is right.
Assuming your csv is valid, you can correct manually the table created by the crawler, as long as you don't run it again (or you run it only to update partitions).

profile pictureAWS
ESPECIALISTA
respondido há 7 meses
profile picture
ESPECIALISTA
avaliado há um mês

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas