How to correctly get the recommendation if the user data or item data change where the previous interactions were based on the old data?

0

Say, we have some users, items in an online shop. We also keep their interaction data. Now say, metadata(Categorical list) of a user had a little update i.e. some data is added to the list. So if we then do a full training, the training will lead to incorrect recommendations. Because the interactions of that user were when the user had the old data.

Now one may say we need to delete those interactions and the problem is solved. But in that case, we are losing interaction which is not good. The metadata(Categorical list) is updated i.e. some data is added not replaced. So delete is not an option.

Let me give an example.

In a shop, each item/product has one or more prerequisite licenses. That means only the users who have those licenses can buy and use those items. The user collects the licenses from the other place which is not in the concern of the shop. They collect licenses when they need them.

User Schema:

{
  "type": "record",
  "name": "Users",
  "namespace": "com.amazonaws.personalize.schema",
  "fields": [
    {
      "name": "USER_ID",
      "type": "string"
    },
    {
      "name": "LICENSES", //Currently available licenses of the user
      "type": "string",
      "categorical": true
    },
  ],
  "version": "1.0"
}

Item Schema:

{
  "type": "record",
  "name": "Items",
  "namespace": "com.amazonaws.personalize.schema",
  "fields": [
    {
      "name": "ITEM_ID",
      "type": "string"
    },
    {
      "name": "PREREQUISITE_LICENSES",//The licenses user need to have to buy and use the item
      "type": "string",
      "categorical": true
    }
  ],
  "version": "1.0"
}

Now say a user has license L1 and L2 and he has bought and used some products which needed license L1, L2, or L1 & L2. We have all the interactions. Now if that user bought another License L3, and we need to a full training that time, it will seem like the user with license L1, L2, L3 has only interacted with those items(L1, L2 items). But in reality, the user interacted with those(L1, L2) items because at that time he had not the license L3. It will give wrong information to the model.

This can happen for other users too. In that case, the recommendation will have lower accuracy. How to handle that situation? Is there any Idea?

asked 2 years ago234 views
1 Answer
0

A couple of different levels/timescales of impact to mention here:

First (and fastest), assuming you're using Filter Expressions to limit your Personalize recommendations to only eligible items for the user's current subscriptions, you should see the filtering of eligible products update within 15 minutes of updating the user's (or item's) metadata through Personalize's PutUsers or PutItems APIs... And usually much faster (seconds to minutes).

So that covers the basic state & dynamics, but yes to actually train the significance of that update into the learned model you'll also need to wait until the next retrain. I also understand that for these kinds of updates (since we're talking about user/item metadata changes that cut across the interaction history, a full re-train (rather than just incremental) could be beneficial.

Finally and on the main body of your question, yes as you correctly point out Personalize schemas currently treat user and item metadata sets as static (over the interaction history) for the purpose of model training.

So given that you can use rule-based filter expressions to restrict recommendations to current eligible products at the point of request - the question becomes:

  1. To what extent might this simplifying assumption bias your model and reduce the quality of recommendations made after whatever additional filtering is applied? And
  2. Is there anything that can be done about it?

(1) is of course dataset-dependent and not super straightforward to measure unless you have another model with time-varying item/user metadata to compare to... My general view would be that while I agree it seems like it's not ideal from a pure perspective, in practice I've seen customers with similar challenges still get good results despite this simplification. Remember that the calculated offline metrics (recall, precision, etc) are in the absence of post-filtering which may significantly improve real-world performance if there's harsh eligibility criteria narrowing the product list down after the model.

For potential mitigations (2), you could explore adding the user's point-in-time licenses as an interaction metadata field. For this you would back-calculate the available licenses at the time of each interaction in your historical interactions dataset, and also provide current licenses as context at the point of calling GetRecommendations.

In this case you'd be directly training the model to account for the changing set of licenses at each user interaction, but also increasing the amount of information you're asking the model to learn by making the feature dynamic instead of static... So important to remember that:

  • It's not guaranteed to produce better results in practice, and
  • Since offline metrics are calculated without filtering rules, I'd avoid putting too much trust in comparing metrics to judge if adding the interaction metadata works better for you. A live pilot/AB test would be a much better measure, and sounds like a good idea to me if you're concerned that this effect could be important in your use case.
AWS
EXPERT
Alex_T
answered 2 years ago
  • Thanks, Alex_T for your answer. In our use cases, we aren't filtering because we want to show all the items to the users even if the user can't buy because of not having the required licenses. But I love the idea of using licenses as Contextual Metadata. But I have one query. Here the license data will be an array and the user can have any combination of the license of all the licenses. We can store the array as | separated string on Interaction metadata. But I am wondering if we can pass such | separated string as context on GetRecommendation and would it work. On the documentation, it never explicitly says that we can provide such categorical data in context. All the examples use a single string, not a | separated array of strings. So I am wondering if I can pass such | separated array of strings on context which would work. I have created another question explaining this issue please answer. https://repost.aws/questions/QUofI9JP9jQ_eyNdt2FweyZA/can-we-how-to-provide-an-array-as-contextual-metadata-on-get-recommendation

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions