AWS Glue: processing double quote in source CSV data

0

I am using AWS Glue Clawer to process the following CSV dataset, but in the name column, the data including double quotes and comma. It broke the Glue clawer output into table. csv dataset example

PassengerId,Survived,Pclass,Name,bio,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C
11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S

I tried to use classifier but it did not resolve the problem. Still Athena query was broken.

HIVE_BAD_DATA: Error Parsing a column in the table: Cannot convert value of type String to a DOUBLE value
This query ran against the "awsml-titanic" database, unless qualified by the query. Please post the error message on our forum  or contact customer support  with Query Id: 279cceef-2bcf-4097-8e1a-d10b9cb7a960

Any suggestion will be welcomed.

posta 10 mesi fa588 visualizzazioni
1 Risposta
0

Could it be solved by making the "name" column a string when creating the table in Athena?
https://repost.aws/knowledge-center/athena-hive-bad-data-error-csv

The other solution would be to change the AWS Glue table property to use OpenCSVSerDe.
Try following the instructions in the following document to set it up.
https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html#schema-csv

profile picture
ESPERTO
con risposta 10 mesi fa
profile picture
ESPERTO
verificato un mese fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande