glue crawler Serde serialization lib

0

I created a glue CSV crawler in my DEV account, the CSV files are crawled correctly and the tables have this properties :

Name   tbl_csv_s_mytable
Database    db_rdsmydb
csvLocation   s3://xxxxxx
Connection  Deprecated
org.apache.hadoop.mapred.TextInputFormatOutput format
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatSerde serialization lib
**org.apache.hadoop.hive.serde2.OpenCSVSerdeSerde parameters**

**quoteChar "**
**separatorChar ,**

I did the exact same thing on stage account the table are not correctly crawled I have col0, col1 ...instead of columns names :

Name	tbl_csv_s_mytable
Database	db_rdsmydb
Classification	csv
Location	  s3://xxxxx
Connection	 Deprecated	No

Input format	org.apache.hadoop.mapred.TextInputFormat
Output format	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
**Serde serialization lib	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe**

file.delim => instead of quoteChar 

I used the same config of classifier, not sure why it works in DEV but not in the stage? I followed the exact same steps on both account

Serde serialization lib is the issue?

I,m not sure why it's settled as org.apache.hadoop.hive.serde2.OpenCSVSerdeSerde parameters in DEV

and as org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe in stage account

any ideas

thank you!!

Jess
已提問 2 年前檢視次數 1125 次
1 個回答
0

From the configuration you shared you are using two different classifiers for the 2 crawlers and this is why you get a different behavior.

In Dev you are probably using a Custom CSV Classifier , see this documentation page to understand how it was created, and attached to the crawler. In the definition of the custom crawler you have defined how to menage the column separators and identifying the column delimiter.

In Stage instead the Crawler has been created with the native classifier.

that is the difference you see in the Serde serialization lib.

hope this helps,

AWS
專家
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南