- Newest
- Most votes
- Most comments

The reason there is no zero or constant value option for the "method to aggregate numeric values" is because that setting is intended to aggregate values within a time series, not impute missing values.

When resampling a time series in SageMaker Data Wrangler, it needs to determine how to represent the value for an interval based on the values that fall within it. Common options are:

Sum - Add up all values Average - Take the mean of all values Minimum - Use the minimum value Maximum - Use the maximum value A zero or constant value would not make sense in this context, as it does not represent an aggregation of existing values.

The imputation of missing values happens separately, before the time series is resampled. At that stage, options like zero can be used to fill in missing intervals.

To aggregate numeric values to a constant like zero when resampling time series data in SageMaker Data Wrangler: Import your time series data into SageMaker Data Wrangler. Add a "Resample Time Series" step and select the appropriate time interval for resampling (e.g. hourly, daily etc.). For the "Method to aggregate numeric values" option, select "Custom function". In the custom function text box, write a Lambda function to return a constant value of 0, such as:

lambda x: 0 This will cause all numeric values within each resampled interval to be aggregated to a constant value of 0, effectively imputing missing values with 0. You can then proceed with further transformations and analysis on the resampled time series data.

Hello,

Thank you for using AWS SageMaker service and reaching out to us. I understand that you want to Resample the time series with zeroes.

By resampling time-series data, you can establish regular intervals for the observations in your time series dataset. This is particularly useful when working with time series data containing irregularly spaced observations. For instance, you can use resampling to transform a dataset with observations recorded every one hour, two hour and three hour intervals into a regular one hour interval between observations. Forecasting algorithms require the observations to be taken at regular intervals. You might have missing data for different reasons. The reason for your missing data might inform how you want Canvas to impute it.

Please refer the following AWS documentations on resampling time series data and handling missing values.

[+] Resample time series data - https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-prepare-data.html#canvas-prepare-data-resample

[+] Handling missing values - https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-time-series.html#canvas-time-series-missing

For more detailed assistance, kindly reach out to AWS Support [1] (Sagemaker), along with your issue/use case in detail and share relevant AWS resource names, and we will be happy to troubleshoot accordingly.

## Relevant content

- asked a month ago
- asked 2 years ago
- asked a month ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago

Hi Giovanni,

When resampling how do I get data wrangler to impute missing values instead of aggregating numeric values?

You mention "The imputation of missing values happens separately, before the time series is resampled. At that stage, options like zero can be used to fill in missing intervals." - how do I access this option? Which step do I select in order to perform this?

I really appreciate your rapid response here but you have offered a solution that does not exist. You can see it in my previously shared screenshot but I will share it again below, but "custom funciton" is not an option in the "Method to aggregate numeric values" list.