AWS Glue: processing double quote in source CSV data

0

I am using AWS Glue Clawer to process the following CSV dataset, but in the name column, the data including double quotes and comma. It broke the Glue clawer output into table. csv dataset example

PassengerId,Survived,Pclass,Name,bio,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C
11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S

I tried to use classifier but it did not resolve the problem. Still Athena query was broken.

HIVE_BAD_DATA: Error Parsing a column in the table: Cannot convert value of type String to a DOUBLE value
This query ran against the "awsml-titanic" database, unless qualified by the query. Please post the error message on our forum  or contact customer support  with Query Id: 279cceef-2bcf-4097-8e1a-d10b9cb7a960

Any suggestion will be welcomed.

已提问 1 年前944 查看次数
1 回答
0

Could it be solved by making the "name" column a string when creating the table in Athena?
https://repost.aws/knowledge-center/athena-hive-bad-data-error-csv

The other solution would be to change the AWS Glue table property to use OpenCSVSerDe.
Try following the instructions in the following document to set it up.
https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html#schema-csv

profile picture
专家
已回答 1 年前
profile picture
专家
已审核 6 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则