Bug: Sagemaker Canvas can't import parquet files with numpy.nan/None/pandas.NA as first row value

0

I'm trying to create a tabular dataset in Sagemaker Canvas Data Wrangler by importing a local parquet file created with the pandas python library. I succeed in loading the file and can preview it. However, after pressing "Create dataset" the Status column displayes "Processing" for a few seconds before changing to "Failed" with the following message:

"Canvas can't properly load the dataset preview because of an issue on the Canvas server. Try again in a few minutes, or contact your administrator and share the details below to resolve the issue. If you're an administrator or an individual user, contact AWS support and provide the following code: <RETRACTED> to resolve the issue."

When previewing the data before pressing "Create dataset" I note that some columns are missing. Notably all these missing columns have a NaN value as the first row entry when checking locally. These are numpy.nan (float64) values. When converting my parquet file into a csv file and then uploading it I succeed without issue. But I don't want to have to do this for all my parquet files.

I replicated this issue by creating a simple pandas dataframe with five columns, "A"-"E", one row each, where the three columns "A", "B" and "C" had values "None", "pandas.NA" and "numpy.nan" respectively, while "D" and "E" had regular strings and floats as values. I saved the dataframe both in parquet and csv format. When trying to load this parquet file, the columns A, B and C are missing and I get the same error as above. When reading it as a csv the columns A, B and C are blank and the dataset loads successfully.

Is this a known bug?

1 回答
0

Hi,

The proper place to report is not re:Post: AWS service teams are not supposed to monitor this community site for bug reports.

The official issue process goes via the AWS Console > Support > Create case

Best,

Didier

profile pictureAWS
专家
已回答 3 个月前
  • Hi, I'm on the Basic plan and can't create "Technical" cases. So your bug will cost me $29 to maybe resolve? Thanks.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则