- Newest
- Most votes
- Most comments
To begin with, I would like to inform you that, the "Score" in the result from a Custom Comprehend Classifier in Multi-Class Mode represents the confidence or probability that the input text belongs to a particular class. The classifier outputs a set of scores, one for each class, indicating its certainty that the input belongs to that class.
The predicted class is the one with the highest score. [1] For example, if the scores for an input text are [0.2, 0.6, 0.1, 0.1] for classes A, B, C, and D respectively, the predicted class would be B since it has the highest score of 0.6.
The scores are based on the trained model and can be interpreted as probabilities or confidence levels. However, Amazon Comprehend does not provide a specific threshold or minimum confidence level to determine the presence of a class. It simply assigns the input text to the class with the highest score.
To evaluate the performance of the multi-class classifier, Amazon Comprehend provides metrics such as accuracy, precision, recall, and F1 score. [1] [2] [3] These metrics are calculated based on the predictions made on the test data during training. Sources [1] Custom classifier metrics - https://docs.aws.amazon.com/comprehend/latest/dg/cer-doc-class.html [2] Multiclass Model Insights - https://docs.aws.amazon.com/machine-learning/latest/dg/multiclass-model-insights.html [3] Multiclass Classification - https://docs.aws.amazon.com/machine-learning/latest/dg/multiclass-classification.html
Moving forward, for the given result: { "File": "RandomFile.txt", "Line": "0", "Classes": [ { "Name": "Label 1", "Score": 0.6425 }, { "Name": "Label 2", "Score": 0.1466 }, { "Name": "Label 3", "Score": 0.0976 } ] } The scores can be interpreted as follows: The model predicts that RandomFile.txt has a 64.25% probability of belonging to the class "Label 1". The model predicts that RandomFile.txt has a 14.66% probability of belonging to the class "Label 2". The model predicts that RandomFile.txt has a 9.76% probability of belonging to the class "Label 3". Since this is a single-label classification task, the predicted class for RandomFile.txt is "Label 1" as it has the highest score of 0.6425. The scores represent the model's confidence or probability that the input text belongs to each class, but Amazon Comprehend does not provide a specific threshold to determine the presence of a class. [4] [5] If your application requires high precision, you can set a threshold (e.g. 0.99) and only accept predictions where the highest score exceeds that threshold. However, this may come at the cost of lower recall, as many predictions may be rejected for not meeting the high confidence threshold.
The appropriate threshold depends on your specific use case requirements and the trade-off between precision and recall that can be tolerated. You can evaluate the performance of the classifier using metrics like accuracy, precision, recall, and F1 score provided by Amazon Comprehend. Sources [4] Custom classifier metrics - https://docs.aws.amazon.com/comprehend/latest/dg/cer-doc-class.html [5] Multiclass Model Insights - https://docs.aws.amazon.com/machine-learning/latest/dg/multiclass-model-insights.html
Relevant content
- asked 3 years ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 7 months ago
