Forecast - Using arrays of related item_id as part of metadata

0

I have a number of items prices that I want to predict. I have my data set:

  • Target time series - Main 3 fields, item_id, timestamp, target_value (which is the average sale price on a day)
  • Related time series - Additional fields relating to the item_id, such a sales volumes, high and low sale prices etc
  • Metadata - Relating the item_id to item_name, item_groups and categories etc. But it also have a 'related_item_id' field, which I currently set as a string enclosed array, eg pd.to_csv(..) some are [123] or "[123,456,789]" if it contains a comma etc. Standard behaviour.

Metadata:

item_id,item_name,item_group,item_category,market_group,meta_group,related_item_ids
582,Bantam,Frigate,Ship,Caldari,Tech I,"[34, 35, 36, 37]"
622,Stabber,Cruiser,Ship,Minmatar,Tech I,"[34, 35, 36, 37, 38, 39, 40]"
639,Tempest,Battleship,Ship,Minmatar,Tech I,"[34, 35, 36, 37, 38, 39, 40, 57478, 57479, 57486]"
482,Miner II,Mining Laser,Module,Mining Lasers,Tech II,"[11399, 3689, 483, 11541, 11689, 11482]"

From the output forecast, I don't see that the relationships are taken into consideration.

Question:

How do I link or associate items, so that they can influence other items? Would I have to do it like this:

item_id,item_name,item_group,item_category,market_group,meta_group,related_item_id
582,Bantam,Frigate,Ship,Caldari,Tech I,34
582,Bantam,Frigate,Ship,Caldari,Tech I,35
582,Bantam,Frigate,Ship,Caldari,Tech I,36
582,Bantam,Frigate,Ship,Caldari,Tech I,37
... etc
asked 6 months ago209 views
2 Answers
0

You have arranged the related_item_id field as an array. Forecast does not support the item metadata as an array value, and the input provided got treated as if the entire array were a single value string. So observing that the relationships are not taken into consideration makes sense. Splitting the array into multiple rows will not work either - it is expected that a single item is to have a single value for each item metadata column. At present, it is not possible to achieve this.

Best way forward would be to form groups of items such that all items in the group are related to each other. Then, each group can have a name, and that name can be the item metadata column for items in the group. For example, perhaps "leopard print high heel shoes" is a name of a group, such that it would have attributes such as "Colour" and would be captured for the high heel. The under products the field would have against "Shoe product 1" the name "leopard print high heel shoes".

However, the current dataset is that item 622 is linked to 34 and 40 but 582 is linked to 34 and not to 40. This needs to change, since currently there can’t be a group. We want it so we can say, “622, 34, 40” are a group. Meaning that these items are all linked to each other. And not to other items.

I hope this helps.

AWS
answered 6 months ago
  • Comment part 1:

    Thanks for clarifying that you cannot have arrays. I understand that a reference group is required in this manner. The relationship is enclosed in a single group.

    However, I have multiple items that relate to each other: (eg, 100 relates to 111,112. 200 relates to 112,113) I cannot do this:

    item_id,related_item_group 100,g001 111,g001 112,g001 200,g002 112,g002 113,g002

    As 112 is related to both 100 and 200, can I have multiple metadata entries for that same item_id? Is it a requirement to be a single record?

  • This is essentially highlighting dependencies in a manufacturing process, where the required materials (111,112,113) are required to make the product (100,200). As such there is a no way to fully enclose a material / product relationship to a group.

    Another thought was to establish a pseudo bit-mask with col names as reference the material, but with a 10 column limit, this wouldn't work.

    I'm happy with a 'no, sorry, you can't do that answer', but I hope you understand the problem and can advise an approach.

0

Time-series services like Amazon SageMaker Autopilot, Canvas, Amazon Forecast are automatically doing the relationships between items by converting all of the static variables into vectors and loading them into the models that support cross-learning.

The idea is to learn patterns across time series. Features should be categorical and low cardinality. This means item_name should not be part of item metadata, since it may have a near 1:1 ratio to item_id.

The other features all look like good candidates for item metadata -- note how they have "group" and "category" in them?

item_group,item_category,market_group,meta_group

What this will allow time-series model to do is look for macro level trends in groups -- trending up, down, seasonality, etc.

You can review feature importance after the model develops to see how important these features are -- are they strong, weak, positive, negative? You should also do an A/B test to empirically measure how much change in error was brought about by the presence or absence of these features. As a note, typically, covariates are more influential than static metadata.

AWS
answered 6 months ago
  • Thanks for the response. I don't quite understand how I can take this to a point of resolution. In terms of low cardinality, I'm led to believe that you cannot have duplicate item_id entries for metadata, and yes, item_name is obsolete, as you point out.

    I have a large set of supporting timeseries data, but feel as though one of the greatest relationships is this 'supply chain' dependency. Can you suggest a way that we can establish the relationship of 100 relates to 111,112 and 200 relates to 112,113 in a categorical fashion?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions