Questions tagged with Test & Optimize ML Models

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

  • 1
  • 12 / page

How can we accurately define "better" for recommender metrics with CW Evidently?

I'm exploring using the new [CloudWatch Evidently](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Evidently.html) feature for measuring success of recommendation model deployments with [Amazon Personalize](https://docs.aws.amazon.com/personalize/latest/dg/what-is-personalize.html). In this context, assigning a user or session to a particular feature variation (baseline recommendations list vs Personalize campaign 1 vs Personalize campaign 2) **might** trigger one *or more* valuable "events": - Maybe a click/view for an individual item (What's this item's price? How far down the recommendation list was it?) - Maybe a checkout for a basket of products (with an overall total price or total margin) If I understand right (?), Evidently experiment dashboards for "statistical significance" and "improvement" today look just at the **distribution of values** of recorded events in terms of averages and distribution, right? The number of data points is used for assessing "how significant" but not "what's better"? If so, this seems like a challenge for "optional" events: For example what if one treatment gives me really high basket value on average (only recommending expensive products), but very few users convert? I could see really high metrics for the new treatment, even though its overall value was very poor. Do I understand correctly here? And if so, how might you recommend defining Evidently metrics for these kinds of use cases? For example maybe we'd need to find a way of generating zero-value metric events when a session is abandoned?
1
answers
0
votes
30
views
EXPERT
Alex_T
asked a year ago
  • 1
  • 12 / page