glue crawler Serde serialization lib

0

I created a glue CSV crawler in my DEV account, the CSV files are crawled correctly and the tables have this properties :

Name   tbl_csv_s_mytable
Database    db_rdsmydb
csvLocation   s3://xxxxxx
Connection  Deprecated
org.apache.hadoop.mapred.TextInputFormatOutput format
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatSerde serialization lib
**org.apache.hadoop.hive.serde2.OpenCSVSerdeSerde parameters**

**quoteChar "**
**separatorChar ,**

I did the exact same thing on stage account the table are not correctly crawled I have col0, col1 ...instead of columns names :

Name	tbl_csv_s_mytable
Database	db_rdsmydb
Classification	csv
Location	  s3://xxxxx
Connection	 Deprecated	No

Input format	org.apache.hadoop.mapred.TextInputFormat
Output format	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
**Serde serialization lib	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe**

file.delim => instead of quoteChar 

I used the same config of classifier, not sure why it works in DEV but not in the stage? I followed the exact same steps on both account

Serde serialization lib is the issue?

I,m not sure why it's settled as org.apache.hadoop.hive.serde2.OpenCSVSerdeSerde parameters in DEV

and as org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe in stage account

any ideas

thank you!!

Jess
posta 2 anni fa1125 visualizzazioni
1 Risposta
0

From the configuration you shared you are using two different classifiers for the 2 crawlers and this is why you get a different behavior.

In Dev you are probably using a Custom CSV Classifier , see this documentation page to understand how it was created, and attached to the crawler. In the definition of the custom crawler you have defined how to menage the column separators and identifying the column delimiter.

In Stage instead the Crawler has been created with the native classifier.

that is the difference you see in the Serde serialization lib.

hope this helps,

AWS
ESPERTO
con risposta 2 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande