Bug: Sagemaker Canvas can't import parquet files with numpy.nan/None/pandas.NA as first row value

0

I'm trying to create a tabular dataset in Sagemaker Canvas Data Wrangler by importing a local parquet file created with the pandas python library. I succeed in loading the file and can preview it. However, after pressing "Create dataset" the Status column displayes "Processing" for a few seconds before changing to "Failed" with the following message:

"Canvas can't properly load the dataset preview because of an issue on the Canvas server. Try again in a few minutes, or contact your administrator and share the details below to resolve the issue. If you're an administrator or an individual user, contact AWS support and provide the following code: <RETRACTED> to resolve the issue."

When previewing the data before pressing "Create dataset" I note that some columns are missing. Notably all these missing columns have a NaN value as the first row entry when checking locally. These are numpy.nan (float64) values. When converting my parquet file into a csv file and then uploading it I succeed without issue. But I don't want to have to do this for all my parquet files.

I replicated this issue by creating a simple pandas dataframe with five columns, "A"-"E", one row each, where the three columns "A", "B" and "C" had values "None", "pandas.NA" and "numpy.nan" respectively, while "D" and "E" had regular strings and floats as values. I saved the dataframe both in parquet and csv format. When trying to load this parquet file, the columns A, B and C are missing and I get the same error as above. When reading it as a csv the columns A, B and C are blank and the dataset loads successfully.

Is this a known bug?

1回答
0

Hi,

The proper place to report is not re:Post: AWS service teams are not supposed to monitor this community site for bug reports.

The official issue process goes via the AWS Console > Support > Create case

Best,

Didier

profile pictureAWS
エキスパート
回答済み 3ヶ月前
  • Hi, I'm on the Basic plan and can't create "Technical" cases. So your bug will cost me $29 to maybe resolve? Thanks.

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ