Custom Entity Recognition Job Not Working with .txt files


I am trying to run a custom entity recognition job on a couple text files. I trained the recognizer using PDF annotation, but I am sending .txt files into the job.

So, I am getting this error: DOCUMENT_CORPUS_SIZE_LESS_THAN_MINIMUM: Document corpus size is less than the minimum requirement: 5120 bytes.

It is true, the text files in my input folder are only 1.2 KB, but I am not sure how to proceed from here. I tried changing the options to "ONE_DOC_PER_LINE", but that gave another error saying that is unsupported for semi-structured data.

UPDATE: I added extra text to the file, and it went through, but the entity recognition is very poor. Before this, I was submitting PDFs to the analysis job, and it was working well. However, I need to submit text documents because I want to eventually create an endpoint to use for this job. What do I do?

gefragt vor 2 Jahren331 Aufrufe
1 Antwort

This was a bug in our validation logic which did not allow for a total input corpus size of less than 5kb for a semi-structured trained model. This issue is being addressed and should be deployed worldwide by 07/29.

beantwortet vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen