Multi-label - Insufficient data.

0

When I upload a CSV for training using multi-label classifier, AWS reports:

Insufficient data. Required: At least 10 examples for each label. Consider adding more training data.

I do have at least 10 examples. Much more than that.

natted
質問済み 4年前500ビュー
9回答
0

I have had no luck here.

Does multi-label require over ten examples for each "combination" of labels?

For example, my training data has ten categories with many thousands of trained examples.

LABEL1,"document"
LABEL2,"document"
LABEL3,"document"
LABEL2|LABEL3,"document"

When labels are combined there are situations where ten examples may not exist. Is this what would cause the training to fail?

I can add more training data but it seems a waste of time if it is not even clear. Amazon should provide better documentation on the training files.

natted
回答済み 4年前
0

Hi,

So when I ran into this issue the way I had to resolve it was by making sure my data/csv had 10 of each label, so:

Label1 and Label2 and Label3 etc. - should have 10 or more occurrences in the csv

if any one of your labels does not have 10 or more occurrences, the training will fail, Insufficient Data message (example below)

Fail:
Label1|Label2
Label1|Label2
Label1|Label2
Label1|Label2
Label1
Label1
Label1
Label1
Label1
Label1

prav1
回答済み 4年前
0

Thanks I finally found a rogue character in my file. Now resolved

natted
回答済み 4年前
0

Hi,

I am from the Comprehend Engineering team. Can you please PM your accountID, region, and classifier name which encountered the issue? We would like to improve the customer experience in detecting such characters that trip up our training and informing the user about where to look.

Thanks!
Seema Suresh

回答済み 4年前
0

I'm having a similar issue - I have enough labels, but the classifier training fails. What character was it that was causing an issue in the end, so I can check for that and remove it?

JDBaker
回答済み 4年前
0

Hello ,
I started creating my first labelling job(Crowd classifier-Multi select) using sagemaker console(workforce already setup). Input data is a CSV file with free text(chats from twitter data set). I added my own new labels. When I spin up the labelling tool for preview before creating the job, it shows no error but after I create the job and then spin up the labelling tool UI(Crowd source),
I get the following error :
(Element type CROWD-CLASSIFIER-MULTI-SELECT): attributes should have required property 'categories'

Details:
My CSV input file has two columns(text_id, text) -
text_id Text
1001 <text to be labelled by labelling job>

I added my own categories(labels) after creating the mainfest file

Any help is appreciated on this issue?
Looks like I am missing something basic here.

回答済み 3年前
0

Hi,

I am from the Comprehend engineering team.

It sounds like your issue is with Sagemaker Groundtruth and not with Comprehend. Is that right?

Thanks,
Seema Suresh

回答済み 3年前
0

Hey,

as the question wasn't answered in the forum yet: Do we need 10 samples per combination for a multilabel classification or just 10 samples per label? Is it e.g enough to have 10 samples for CLASS 1 and 10 for CLASS 2 or do I also need 10 samples for the combination of CLASS 1 & CLASS 2?

Thanks in advance

回答済み 3年前
0

Hey,

as the question wasn't answered in the forum yet: Do we need 10 samples per combination for a multilabel classification or just 10 samples per label? Is it e.g enough to have 10 samples for CLASS 1 and 10 for CLASS 2 or do I also need 10 samples for the combination of CLASS 1 & CLASS 2?

Thanks in advance

回答済み 3年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ