Skip to content

Batch prediction fails with training dataset on canvas w. "data contains infinite or NaN values"

0

Hi, I built a canvas model (numeric prediction) and the build completes successfully (I can see performance metrics etc. in the analyze tab). However, when I try to generate batch predictions using the training dataset, the task fails with:

"The data contains infinite or NaN values, which cannot be processed.Please check your data for such values and handle them appropriately before proceeding." I checked that there were no such values in any of the key variables. Some other columns will inevitably have them, but I assume the model should be able to handle this -- after all, it was able to build the model with them. Any thoughts/tips?

asked a month ago14 views
1 Answer
0

This is an unusual situation since your model built successfully but batch predictions are failing. Here are some key points to consider:

When making batch predictions in Canvas, the schema of your prediction dataset must exactly match the schema of the dataset used to train the model. This includes matching column names (case-sensitive), data types, and the number of columns. Schema mismatches can cause prediction failures.

During model building, Canvas automatically handles missing values by inferring them using adjacent values present in the dataset. However, there may be differences in how data is processed between the model building and batch prediction phases.

Since you're using the same training dataset for batch predictions, a few things to investigate:

  1. Data validation: Canvas performs data validation checks before building models. Even though your model built successfully, ensure all rows in your dataset are complete and that missing values have been properly handled.

  2. Column consistency: Verify that no columns were dropped during the model building process. If columns were dropped when building the model, this could create inconsistencies when running batch predictions.

  3. Data types: Double-check that all numeric columns truly contain only numeric values and that there are no hidden infinite or NaN values that might have been introduced through data processing.

Since the model built successfully, the issue likely stems from how the data is being processed during the batch prediction phase versus the training phase. You may need to explicitly clean the dataset to remove or replace any infinite or NaN values in all columns before attempting batch predictions again.
Sources
Data validation - Amazon SageMaker
Batch prediction dataset requirements - Amazon SageMaker AI
Preview your model - Amazon SageMaker

answered a month ago
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.