Bug: Sagemaker Canvas can't import parquet files with numpy.nan/None/pandas.NA as first row value

0

I'm trying to create a tabular dataset in Sagemaker Canvas Data Wrangler by importing a local parquet file created with the pandas python library. I succeed in loading the file and can preview it. However, after pressing "Create dataset" the Status column displayes "Processing" for a few seconds before changing to "Failed" with the following message:

"Canvas can't properly load the dataset preview because of an issue on the Canvas server. Try again in a few minutes, or contact your administrator and share the details below to resolve the issue. If you're an administrator or an individual user, contact AWS support and provide the following code: <RETRACTED> to resolve the issue."

When previewing the data before pressing "Create dataset" I note that some columns are missing. Notably all these missing columns have a NaN value as the first row entry when checking locally. These are numpy.nan (float64) values. When converting my parquet file into a csv file and then uploading it I succeed without issue. But I don't want to have to do this for all my parquet files.

I replicated this issue by creating a simple pandas dataframe with five columns, "A"-"E", one row each, where the three columns "A", "B" and "C" had values "None", "pandas.NA" and "numpy.nan" respectively, while "D" and "E" had regular strings and floats as values. I saved the dataframe both in parquet and csv format. When trying to load this parquet file, the columns A, B and C are missing and I get the same error as above. When reading it as a csv the columns A, B and C are blank and the dataset loads successfully.

Is this a known bug?

1 Answer
0

Hi,

The proper place to report is not re:Post: AWS service teams are not supposed to monitor this community site for bug reports.

The official issue process goes via the AWS Console > Support > Create case

Best,

Didier

profile pictureAWS
EXPERT
answered 3 months ago
  • Hi, I'm on the Basic plan and can't create "Technical" cases. So your bug will cost me $29 to maybe resolve? Thanks.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions