Resample the Time Series with Zeros with Data Wrangler in SageMaker Canvas


While creating a Data flow in Data Wrangler in SageMaker Canvas I am noticing that the feature to "Resample the Time Series" is necessary before I can use the lag features that are also available. I notice that in the options for method of imputation for numeric values there is a zero option, which we would want. Instead the "method to aggregate numeric values" is used and there is no zero option here. Is there a reason that no zero or constant entry option is available here? A screenshot is shown below. Resample options within canvas I understand if my request is misguided and I should come at this from another angle, but I notice that the "handle missing time series" option does not accept date types, and only accepts vector or array types.

For context our data only has rows when there was harvest/an occurrence on that date, thus there are gaps in the dates present.

asked a month ago89 views
2 Answers

The reason there is no zero or constant value option for the "method to aggregate numeric values" is because that setting is intended to aggregate values within a time series, not impute missing values.

When resampling a time series in SageMaker Data Wrangler, it needs to determine how to represent the value for an interval based on the values that fall within it. Common options are:

Sum - Add up all values Average - Take the mean of all values Minimum - Use the minimum value Maximum - Use the maximum value A zero or constant value would not make sense in this context, as it does not represent an aggregation of existing values.

The imputation of missing values happens separately, before the time series is resampled. At that stage, options like zero can be used to fill in missing intervals.

To aggregate numeric values to a constant like zero when resampling time series data in SageMaker Data Wrangler: Import your time series data into SageMaker Data Wrangler. Add a "Resample Time Series" step and select the appropriate time interval for resampling (e.g. hourly, daily etc.). For the "Method to aggregate numeric values" option, select "Custom function". In the custom function text box, write a Lambda function to return a constant value of 0, such as:

lambda x: 0 This will cause all numeric values within each resampled interval to be aggregated to a constant value of 0, effectively imputing missing values with 0. You can then proceed with further transformations and analysis on the resampled time series data.

profile picture
answered a month ago
  • Hi Giovanni,

    When resampling how do I get data wrangler to impute missing values instead of aggregating numeric values?

    You mention "The imputation of missing values happens separately, before the time series is resampled. At that stage, options like zero can be used to fill in missing intervals." - how do I access this option? Which step do I select in order to perform this?

    I really appreciate your rapid response here but you have offered a solution that does not exist. You can see it in my previously shared screenshot but I will share it again below, but "custom funciton" is not an option in the "Method to aggregate numeric values" list.



Thank you for using AWS SageMaker service and reaching out to us. I understand that you want to Resample the time series with zeroes.

By resampling time-series data, you can establish regular intervals for the observations in your time series dataset. This is particularly useful when working with time series data containing irregularly spaced observations. For instance, you can use resampling to transform a dataset with observations recorded every one hour, two hour and three hour intervals into a regular one hour interval between observations. Forecasting algorithms require the observations to be taken at regular intervals. You might have missing data for different reasons. The reason for your missing data might inform how you want Canvas to impute it.

Please refer the following AWS documentations on resampling time series data and handling missing values.

[+] Resample time series data -

[+] Handling missing values -

For more detailed assistance, kindly reach out to AWS Support [1] (Sagemaker), along with your issue/use case in detail and share relevant AWS resource names, and we will be happy to troubleshoot accordingly.


answered 25 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions