Custom Entity Recognition Job Not Working with .txt files

0

I am trying to run a custom entity recognition job on a couple text files. I trained the recognizer using PDF annotation, but I am sending .txt files into the job.

So, I am getting this error: DOCUMENT_CORPUS_SIZE_LESS_THAN_MINIMUM: Document corpus size is less than the minimum requirement: 5120 bytes.

It is true, the text files in my input folder are only 1.2 KB, but I am not sure how to proceed from here. I tried changing the options to "ONE_DOC_PER_LINE", but that gave another error saying that is unsupported for semi-structured data.


UPDATE: I added extra text to the file, and it went through, but the entity recognition is very poor. Before this, I was submitting PDFs to the analysis job, and it was working well. However, I need to submit text documents because I want to eventually create an endpoint to use for this job. What do I do?

已提問 2 年前檢視次數 331 次
1 個回答
0

This was a bug in our validation logic which did not allow for a total input corpus size of less than 5kb for a semi-structured trained model. This issue is being addressed and should be deployed worldwide by 07/29.

AWS
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南