Question about use AWS Forecast

0

Does anyone know the detail about using AWS Forecast? I have the following 2 questions:

  1. When I use AWS Forecast via boto3, if my training data is stored in S3, I create the dataset group, create dataset, then import dataset. If my dataset in S3 is updated, and I want to reimport the new data, do I need to repeat the process by creating dataset group, creating dataset, then importing dataset? Or I just need to import the dataset again (assuming the URL to the data in s3 hasn't change?)?

  2. When use the DeepAR algorithm with CUSTOM data type, if my dataset has the dimensions column of item_id, will the algorithm recognize that item_id with the same dimensions will have a similar pattern? Or I need to put item_id and dimensions in metadata, so that the deeper algorithm can recognize that?

Thanks!

ywei
已提問 4 年前檢視次數 479 次
1 個回答
0
  1. All you need to do is import the new data. Datasets > Create dataset import > import new data. Without changing the dataset group or even the dataset itself. Cost-wise, you probably want to delete the old data import, since it adds to Forecast internal S3 storage costs (which is tiny at 0.024/GB/month). You also don't need to worry if the S3 path changed or not, doesn't matter since you have to specify it in the import job. If you decide to keep both sets of data imports, in the Predictors > Train new predictor, Forecast will automatically select the newest data import.

  2. Schemas are mappings of Amazon Forecast expected inputs to columns in your .csv file. Depending on which schema-type you pick, there will be certain “reserved words” and you have to rename your column names to match. See https://docs.aws.amazon.com/forecast/latest/dg/howitworks-domains-ds-types.html
    For example, suppose you chose the “Custom”-type schema. The documentation says the following fields are required:
    item_id (string)
    timestamp (timestamp)
    target_value (floating-point integer)

Your input .csv file must include the required columns. The order of your columns does not matter and you may have additional columns besides the minimum required columns, but all columns must be accounted for in the schema. The schema JSON definition must also exactly match your .csv file column order. With “Custom”-type schema, your JSON might look like this:
{
"Attributes": [
{
"AttributeName": "item_id",
"AttributeType": "string"
},
{
"AttributeName": "store_id",
"AttributeType": "string"
},
{
"AttributeName": "timestamp",
"AttributeType": "timestamp"
},
{
"AttributeName": "target_value",
"AttributeType": "float"
}
]
}

Note with "custom" schema there is 1 required dimension "item_id", (time is always an implicit dimension). Above example, "custom" schema is modified to have 2 forecast dimensions "item_id" and "store_id". I'm not sure about your question about further dimensions in metadata. So, metadata is for data that does not vary with time, such as product hierarchy info. Metadata is not considered a forecast dimension. See metadata doc https://docs.aws.amazon.com/forecast/latest/dg/item-metadata-datasets.html

已回答 4 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南