Custom Entity Recognition Job Not Working with .txt files


I am trying to run a custom entity recognition job on a couple text files. I trained the recognizer using PDF annotation, but I am sending .txt files into the job.

So, I am getting this error: DOCUMENT_CORPUS_SIZE_LESS_THAN_MINIMUM: Document corpus size is less than the minimum requirement: 5120 bytes.

It is true, the text files in my input folder are only 1.2 KB, but I am not sure how to proceed from here. I tried changing the options to "ONE_DOC_PER_LINE", but that gave another error saying that is unsupported for semi-structured data.

UPDATE: I added extra text to the file, and it went through, but the entity recognition is very poor. Before this, I was submitting PDFs to the analysis job, and it was working well. However, I need to submit text documents because I want to eventually create an endpoint to use for this job. What do I do?

asked 2 years ago286 views
1 Answer

This was a bug in our validation logic which did not allow for a total input corpus size of less than 5kb for a semi-structured trained model. This issue is being addressed and should be deployed worldwide by 07/29.

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions