AWS Glue: processing double quote in source CSV data

0

I am using AWS Glue Clawer to process the following CSV dataset, but in the name column, the data including double quotes and comma. It broke the Glue clawer output into table. csv dataset example

PassengerId,Survived,Pclass,Name,bio,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C
11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S

I tried to use classifier but it did not resolve the problem. Still Athena query was broken.

HIVE_BAD_DATA: Error Parsing a column in the table: Cannot convert value of type String to a DOUBLE value
This query ran against the "awsml-titanic" database, unless qualified by the query. Please post the error message on our forum  or contact customer support  with Query Id: 279cceef-2bcf-4097-8e1a-d10b9cb7a960

Any suggestion will be welcomed.

demandé il y a 10 mois588 vues
1 réponse
0

Could it be solved by making the "name" column a string when creating the table in Athena?
https://repost.aws/knowledge-center/athena-hive-bad-data-error-csv

The other solution would be to change the AWS Glue table property to use OpenCSVSerDe.
Try following the instructions in the following document to set it up.
https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html#schema-csv

profile picture
EXPERT
répondu il y a 10 mois
profile picture
EXPERT
vérifié il y a un mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions