glue crawler Serde serialization lib

0

I created a glue CSV crawler in my DEV account, the CSV files are crawled correctly and the tables have this properties :

Name   tbl_csv_s_mytable
Database    db_rdsmydb
csvLocation   s3://xxxxxx
Connection  Deprecated
org.apache.hadoop.mapred.TextInputFormatOutput format
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatSerde serialization lib
**org.apache.hadoop.hive.serde2.OpenCSVSerdeSerde parameters**

**quoteChar "**
**separatorChar ,**

I did the exact same thing on stage account the table are not correctly crawled I have col0, col1 ...instead of columns names :

Name	tbl_csv_s_mytable
Database	db_rdsmydb
Classification	csv
Location	  s3://xxxxx
Connection	 Deprecated	No

Input format	org.apache.hadoop.mapred.TextInputFormat
Output format	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
**Serde serialization lib	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe**

file.delim => instead of quoteChar 

I used the same config of classifier, not sure why it works in DEV but not in the stage? I followed the exact same steps on both account

Serde serialization lib is the issue?

I,m not sure why it's settled as org.apache.hadoop.hive.serde2.OpenCSVSerdeSerde parameters in DEV

and as org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe in stage account

any ideas

thank you!!

Jess
gefragt vor 2 Jahren1125 Aufrufe
1 Antwort
0

From the configuration you shared you are using two different classifiers for the 2 crawlers and this is why you get a different behavior.

In Dev you are probably using a Custom CSV Classifier , see this documentation page to understand how it was created, and attached to the crawler. In the definition of the custom crawler you have defined how to menage the column separators and identifying the column delimiter.

In Stage instead the Crawler has been created with the native classifier.

that is the difference you see in the Serde serialization lib.

hope this helps,

AWS
EXPERTE
beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen