I am trying to run a custom entity recognition job on a couple text files. I trained the recognizer using PDF annotation, but I am sending .txt files into the job.

So, I am getting this error: DOCUMENT_CORPUS_SIZE_LESS_THAN_MINIMUM: Document corpus size is less than the minimum requirement: 5120 bytes.

It is true, the text files in my input folder are only 1.2 KB, but I am not sure how to proceed from here. I tried changing the options to "ONE_DOC_PER_LINE", but that gave another error saying that is unsupported for semi-structured data.

UPDATE: I added extra text to the file, and it went through, but the entity recognition is very poor. Before this, I was submitting PDFs to the analysis job, and it was working well. However, I need to submit text documents because I want to eventually create an endpoint to use for this job. What do I do?

This was a bug in our validation logic which did not allow for a total input corpus size of less than 5kb for a semi-structured trained model. This issue is being addressed and should be deployed worldwide by 07/29.

