How can we accurately define "better" for recommender metrics with CW Evidently?


I'm exploring using the new CloudWatch Evidently feature for measuring success of recommendation model deployments with Amazon Personalize.

In this context, assigning a user or session to a particular feature variation (baseline recommendations list vs Personalize campaign 1 vs Personalize campaign 2) might trigger one or more valuable "events":

  • Maybe a click/view for an individual item (What's this item's price? How far down the recommendation list was it?)
  • Maybe a checkout for a basket of products (with an overall total price or total margin)

If I understand right (?), Evidently experiment dashboards for "statistical significance" and "improvement" today look just at the distribution of values of recorded events in terms of averages and distribution, right? The number of data points is used for assessing "how significant" but not "what's better"?

If so, this seems like a challenge for "optional" events: For example what if one treatment gives me really high basket value on average (only recommending expensive products), but very few users convert? I could see really high metrics for the new treatment, even though its overall value was very poor.

Do I understand correctly here? And if so, how might you recommend defining Evidently metrics for these kinds of use cases?

For example maybe we'd need to find a way of generating zero-value metric events when a session is abandoned?

1 Answer

Hi, the question of what is "best" will depend on your KPIs. Amazon CloudWatch Evidently, can record the events and assign them to the testing groups, but it is for your business to decide what metric works best.

You can sent the metrics to Amazon CloudWatch Evidently at any point in the user journey that matches your use case. For instance, if you don't want to record metrics for abandoned sessions, only send the metrics to Amazon CloudWatch Evidently when the customer "checks out"/"purchases". These can be the metrics for all previous events in the session.

With Amazon CloudWatch Evidently you can decide what metric to track, because you can decide what to send to the service. You can control what you send to match your use-case.


answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions