Skip to content

AWS Comprehend's Toxic Content Detection consistently returns false positives for SEXUAL content tag

0

I am encountering significant issues with AWS Comprehend's detect-toxic-content API, specifically regarding false positives in the SEXUAL content classification. The model is assigning unusually high confidence scores to completely innocuous text segments. Here are some concerning examples:

Test Cases:

"It is a good day for me…" --> SEXUAL score: 0.997 (99.7% confidence)

"first day back at school and it's a beautiful moment!" --> SEXUAL score: 0.990 (99% confidence)

"Tried tennis for the first time! 🎾 It was harder than I expected but so much fun!!" --> SEXUAL score: 0.456 (45.6% confidence)

"I got my test back and didn't do great but at least I passed 😃" --> SEXUAL score: 0.517 (51.7% confidence)

As you can see, the model is flagging everyday, non-sexual content with remarkably high confidence scores. This is particularly problematic for the first two examples, where completely innocent statements are being classified as sexual content with >99% confidence. Has anyone else encountered similar issues? This could be a significant problem for applications relying on this API for content moderation.

asked a year ago205 views
1 Answer
1

Thank you for bringing this issue to attention. Based on the information provided in the AWS documentation and examples, the behavior you're describing with AWS Comprehend's Toxic Content Detection API does not align with its expected performance.

The Toxic Content Detection feature is designed to identify potentially harmful content across seven categories, including sexual content. However, the examples you've provided - which contain everyday, non-sexual language - should not typically trigger high confidence scores for the SEXUAL category.

For instance, phrases like "It is a good day for me" or "first day back at school" are neutral statements that don't contain any sexual connotations. The high confidence scores (99.7% and 99% respectively) for these innocuous phrases are indeed concerning and suggest a potential issue with the model's classification.

It's important to note that machine learning models can sometimes produce unexpected results, but the consistency and severity of the false positives you're experiencing are unusual. This behavior could potentially impact applications relying on this API for content moderation, as you've pointed out.

Given these observations, it would be advisable to:

  1. Double-check that you're using the most up-to-date version of the AWS SDK or CLI.
  2. Verify that the API calls are being made correctly, with all parameters set as intended.
  3. Test with a wider range of inputs to see if the issue persists across different types of content.
  4. If the problem continues, it would be best to contact AWS Support directly. They would be able to investigate whether there's an underlying issue with the model or if there have been any recent changes that might explain this behavior.

Remember, the Toxic Content Detection feature is designed to help build safer environments for end users, so accurate classification is crucial. AWS typically maintains high standards for their services, so they would likely want to address any significant discrepancies in the model's performance.
Sources
New for Amazon Comprehend – Toxicity Detection | AWS News Blog
Trust and safety - Amazon Comprehend
ToxicContent - Amazon Comprehend API Reference
Amazon Comprehend - Trust and Safety
Flag harmful content using Amazon Comprehend toxicity detection | AWS Machine Learning Blog

answered a year ago
EXPERT
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.