Question about use AWS Forecast

0

Does anyone know the detail about using AWS Forecast? I have the following 2 questions:

  1. When I use AWS Forecast via boto3, if my training data is stored in S3, I create the dataset group, create dataset, then import dataset. If my dataset in S3 is updated, and I want to reimport the new data, do I need to repeat the process by creating dataset group, creating dataset, then importing dataset? Or I just need to import the dataset again (assuming the URL to the data in s3 hasn't change?)?

  2. When use the DeepAR algorithm with CUSTOM data type, if my dataset has the dimensions column of item_id, will the algorithm recognize that item_id with the same dimensions will have a similar pattern? Or I need to put item_id and dimensions in metadata, so that the deeper algorithm can recognize that?

Thanks!

ywei
asked 4 years ago392 views
1 Answer
0
  1. All you need to do is import the new data. Datasets > Create dataset import > import new data. Without changing the dataset group or even the dataset itself. Cost-wise, you probably want to delete the old data import, since it adds to Forecast internal S3 storage costs (which is tiny at 0.024/GB/month). You also don't need to worry if the S3 path changed or not, doesn't matter since you have to specify it in the import job. If you decide to keep both sets of data imports, in the Predictors > Train new predictor, Forecast will automatically select the newest data import.

  2. Schemas are mappings of Amazon Forecast expected inputs to columns in your .csv file. Depending on which schema-type you pick, there will be certain “reserved words” and you have to rename your column names to match. See https://docs.aws.amazon.com/forecast/latest/dg/howitworks-domains-ds-types.html
    For example, suppose you chose the “Custom”-type schema. The documentation says the following fields are required:
    item_id (string)
    timestamp (timestamp)
    target_value (floating-point integer)

Your input .csv file must include the required columns. The order of your columns does not matter and you may have additional columns besides the minimum required columns, but all columns must be accounted for in the schema. The schema JSON definition must also exactly match your .csv file column order. With “Custom”-type schema, your JSON might look like this:
{
"Attributes": [
{
"AttributeName": "item_id",
"AttributeType": "string"
},
{
"AttributeName": "store_id",
"AttributeType": "string"
},
{
"AttributeName": "timestamp",
"AttributeType": "timestamp"
},
{
"AttributeName": "target_value",
"AttributeType": "float"
}
]
}

Note with "custom" schema there is 1 required dimension "item_id", (time is always an implicit dimension). Above example, "custom" schema is modified to have 2 forecast dimensions "item_id" and "store_id". I'm not sure about your question about further dimensions in metadata. So, metadata is for data that does not vary with time, such as product hierarchy info. Metadata is not considered a forecast dimension. See metadata doc https://docs.aws.amazon.com/forecast/latest/dg/item-metadata-datasets.html

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions