glue crawler Serde serialization lib

0

I created a glue CSV crawler in my DEV account, the CSV files are crawled correctly and the tables have this properties :

Name   tbl_csv_s_mytable
Database    db_rdsmydb
csvLocation   s3://xxxxxx
Connection  Deprecated
org.apache.hadoop.mapred.TextInputFormatOutput format
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatSerde serialization lib
**org.apache.hadoop.hive.serde2.OpenCSVSerdeSerde parameters**

**quoteChar "**
**separatorChar ,**

I did the exact same thing on stage account the table are not correctly crawled I have col0, col1 ...instead of columns names :

Name	tbl_csv_s_mytable
Database	db_rdsmydb
Classification	csv
Location	  s3://xxxxx
Connection	 Deprecated	No

Input format	org.apache.hadoop.mapred.TextInputFormat
Output format	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
**Serde serialization lib	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe**

file.delim => instead of quoteChar 

I used the same config of classifier, not sure why it works in DEV but not in the stage? I followed the exact same steps on both account

Serde serialization lib is the issue?

I,m not sure why it's settled as org.apache.hadoop.hive.serde2.OpenCSVSerdeSerde parameters in DEV

and as org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe in stage account

any ideas

thank you!!

Jess
asked 2 years ago1084 views
1 Answer
0

From the configuration you shared you are using two different classifiers for the 2 crawlers and this is why you get a different behavior.

In Dev you are probably using a Custom CSV Classifier , see this documentation page to understand how it was created, and attached to the crawler. In the definition of the custom crawler you have defined how to menage the column separators and identifying the column delimiter.

In Stage instead the Crawler has been created with the native classifier.

that is the difference you see in the Serde serialization lib.

hope this helps,

AWS
EXPERT
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions