Setting up data for DeepAR, targets and categories for simultaneous data?

0

I would like to try out DeepAR for an engineering problem that I have some sensor datasets for, but I am unsure how to set it up for ingestion into DeepAR to get a predictive model.

The data is essentially the positions, orientations, and a few other timeseries sensor readings of an assortment of objects (animals, in this case, actually) over time. Data is both noisy and sometimes missing.

So, in this case, there are N individuals and for each individual, there are Z variables of interest per individual. None of the variables are "static" (color, size, etc), they are all expected to be time-varying on the same time scale. Ultimately, I would like to try and predict all Z targets for all N individuals.

How do I set up the timeseries to feed into DeepAR? The premise is that all these individuals are implicitly interacting in the observed space, so all the target values have some interdependence on each other, which is what I would like to see if DeepAR can take into account to make predictions.

Should I be using a category vector of length 2, such that the first cat variable corresponds to the individual, and the second corresponds to one of the variables associated with the individual? Then there would be N*Z targets in my input dataset, each with cat = [ n , z ], where there are N distinct values for n, and z for Z?

1 Answer
1

"Yes, but..."

I agree it sounds like all your N*Z timeseries are prediction targets: You don't know them into the future, so can't provide them as dynamic_feats. Creating each as a target record with a 2D cat encoding the individual n and the variable z would probably be the "right" way to submit this data to SageMaker DeepAR per the algorithm docs.

But I'm not overly optimistic about the success of the DeepAR algorithm for this task... As described in the original paper, it's mainly an auto-regressive model from past target and dynamic_feats to output, with some global conditioning/encoding on the cat features. It's true there's scope in there to learn correlations/interactions between series, but I'm not sure whether there'll be enough bandwidth/capacity in that coupling to learn your inter-individual interactions if those are really dominant.

To illustrate what I mean at a high level:

  1. If we were talking about global position of migratory birds over months, I'd guess (only a guess) that basic auto-regression over what time of year it is and where bird A usually hangs out in March could already tell you quite a lot... Maybe bird A and bird B always travel together - this also seems like a manageable pattern for a primarily forecasting-based model.
  2. If we were watching pigs bounce around in a small pen over seconds, I'd guess the position and direction of each pig would be tightly coupled. Lots of information about the current location of other pigs would be important for deducing where pig X goes next. Seems more like the domain of dynamic system identification to me.

Of course absolutes are pretty hard to come by in ML and DeepAR+ is pre-built - so could still be worth trying. You might even consider a custom domain model in Amazon Forecast to see if the automatic ensembling and extra algorithms it offers could help out... But might be that there are other modelling approaches out there if the forecasting angle doesn't produce the results you're looking for.

AWS
EXPERT
Alex_T
answered 2 years ago
  • Thanks for the thorough answer.

    With some tinkering, I think have the data (well, toy data) reshaped into the right approach, and Sagemaker will ingest it and train

    • N*Z lines in the input JSON file
    • 2-vector for cat, different for every line
    • target on each line is shape [T,], where T is the total timeseries length
    • dynamic_feat on each line is shape [N*Z-1, T]

    as exampled here: https://gist.github.com/apullin/425cd32eb339ca5ce0d10d8aac6dcc18

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions