- Newest
- Most votes
- Most comments
I know this is an extremely old topic, but for those of you finding this result in a search engine, the proper way to solve this is by using a classifier on your crawler. You could either explicitly specify the column headings, or allow auto detection of the column headings within the classifier, more details here: https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html .
Make sure that csv file contains mixed datatypes (string, numeric) and rerun the crawler
Example,
name,age
"john",10
Hi Shivan
Thank you for your answer.
This CSV file have only string data, not mixed datatypes.
You know: Why in this case the crawler can't identify correctly?
I do not know however, you can add manually skip header table property manually and change the column but, it beats the crawler purpose.
Seems like Classifiers don't help when there are multiple pre-amble lines (e.g. 6 lines) in the file before the headers and data begin (for CSV format, at least). This is a pity as we have to do some manual data-cleansing outside of Glue.
If you are using AWS CDK as the IaC tool, you can use the following code to skip the header:
const resource = table.node.defaultChild as cfnglue.CfnTable; resource.addPropertyOverride('TableInput.StorageDescriptor.SerdeInfo', { Parameters: { 'skip.header.line.count': '1', }, });
Relevant content
- asked 5 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a year ago
Thanks for taking the time in 2020 to answer this question, it made my day here in 2023 a lot easier.