Bug: Sagemaker Canvas can't import parquet files with numpy.nan/None/pandas.NA as first row value

0

I'm trying to create a tabular dataset in Sagemaker Canvas Data Wrangler by importing a local parquet file created with the pandas python library. I succeed in loading the file and can preview it. However, after pressing "Create dataset" the Status column displayes "Processing" for a few seconds before changing to "Failed" with the following message:

"Canvas can't properly load the dataset preview because of an issue on the Canvas server. Try again in a few minutes, or contact your administrator and share the details below to resolve the issue. If you're an administrator or an individual user, contact AWS support and provide the following code: <RETRACTED> to resolve the issue."

When previewing the data before pressing "Create dataset" I note that some columns are missing. Notably all these missing columns have a NaN value as the first row entry when checking locally. These are numpy.nan (float64) values. When converting my parquet file into a csv file and then uploading it I succeed without issue. But I don't want to have to do this for all my parquet files.

I replicated this issue by creating a simple pandas dataframe with five columns, "A"-"E", one row each, where the three columns "A", "B" and "C" had values "None", "pandas.NA" and "numpy.nan" respectively, while "D" and "E" had regular strings and floats as values. I saved the dataframe both in parquet and csv format. When trying to load this parquet file, the columns A, B and C are missing and I get the same error as above. When reading it as a csv the columns A, B and C are blank and the dataset loads successfully.

Is this a known bug?

1개 답변
0

Hi,

The proper place to report is not re:Post: AWS service teams are not supposed to monitor this community site for bug reports.

The official issue process goes via the AWS Console > Support > Create case

Best,

Didier

profile pictureAWS
전문가
답변함 3달 전
  • Hi, I'm on the Basic plan and can't create "Technical" cases. So your bug will cost me $29 to maybe resolve? Thanks.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠