AWS Glue: processing double quote in source CSV data

0

I am using AWS Glue Clawer to process the following CSV dataset, but in the name column, the data including double quotes and comma. It broke the Glue clawer output into table. csv dataset example

PassengerId,Survived,Pclass,Name,bio,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C
11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S

I tried to use classifier but it did not resolve the problem. Still Athena query was broken.

HIVE_BAD_DATA: Error Parsing a column in the table: Cannot convert value of type String to a DOUBLE value
This query ran against the "awsml-titanic" database, unless qualified by the query. Please post the error message on our forum  or contact customer support  with Query Id: 279cceef-2bcf-4097-8e1a-d10b9cb7a960

Any suggestion will be welcomed.

質問済み 1年前927ビュー
1回答
0

Could it be solved by making the "name" column a string when creating the table in Athena?
https://repost.aws/knowledge-center/athena-hive-bad-data-error-csv

The other solution would be to change the AWS Glue table property to use OpenCSVSerDe.
Try following the instructions in the following document to set it up.
https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html#schema-csv

profile picture
エキスパート
回答済み 1年前
profile picture
エキスパート
レビュー済み 6ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ