AWS Glue: processing double quote in source CSV data

0

I am using AWS Glue Clawer to process the following CSV dataset, but in the name column, the data including double quotes and comma. It broke the Glue clawer output into table. csv dataset example

PassengerId,Survived,Pclass,Name,bio,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C
11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S

I tried to use classifier but it did not resolve the problem. Still Athena query was broken.

HIVE_BAD_DATA: Error Parsing a column in the table: Cannot convert value of type String to a DOUBLE value
This query ran against the "awsml-titanic" database, unless qualified by the query. Please post the error message on our forum  or contact customer support  with Query Id: 279cceef-2bcf-4097-8e1a-d10b9cb7a960

Any suggestion will be welcomed.

gefragt vor 10 Monaten593 Aufrufe
1 Antwort
0

Could it be solved by making the "name" column a string when creating the table in Athena?
https://repost.aws/knowledge-center/athena-hive-bad-data-error-csv

The other solution would be to change the AWS Glue table property to use OpenCSVSerDe.
Try following the instructions in the following document to set it up.
https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html#schema-csv

profile picture
EXPERTE
beantwortet vor 10 Monaten
profile picture
EXPERTE
überprüft vor einem Monat

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen