SageMaker error: "unexpected EOF"

0

We are trying to run a SageMaker batch transform job and we're getting some errors:

2022-01-18T23:29:00.980:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD 2022-01-18T23:34:21.071:[sagemaker logs]: <<<path to csv file in s3>>>: Unable to get response from algorithm: unexpected EOF

We do not understand what "Unable to get response from algorithm: unexpected EOF" means. How can we get more details about this error?

It would help if we could get the full request and full response from the endpoint. Is this information recorded somewhere in SageMaker?

We have added extra logging in our docker image and we are not able to find an issue on that end. We also tried to log the request and response but those got truncated in CloudWatch.

We would be grateful for any pointers that you can provide. Thanks.

Sebastien

  • First guess would be maybe you have a single or double quote mark in your CSV which is getting interpreted by SageMaker as an unmatched text field delimiter, and causing it to scan to the End Of File expecting to find a matching pair? Or if your algorithm is returning something similar in the result... You could try to temporarily reduce your MaxPayloadInMB and/or set SINGLE_RECORD strategy to reduce payload req/res size and help with the logging?

  • Thank you for your comment.

    I do not have any single or double quotes. It's a very simple model for learning to use SageMaker.

    My input file is 175 mb. I wanted to let SageMaker "chunk" it.

    I get the same behavior with MaxPayloadInMB set to 4, as well as set to 1.

    I have tried SINGLE_RECORD and did not get any errors for that job (I stopped it early, after 24 minutes).

  • You need to post the details of the CreateTransformJobRequest you are sending. Of particular import: BatchStrategy, MaxConcurrentTransforms, TransformInput,SplitType, ContentType arguments. And also verify that it works with a simple csv file that you are 100% sure is formatted correctly

  • I've tested with smaller files and those sometimes work. There's some kind of race condition with how SageMaker posts to our server, but there isn't enough information to know what the problem is. Can't SageMaker be made to write more logs?

    BatchStrategy=MultiRecord. MaxConcurrentTransforms=1, on an ml.m5.2xlarge. SplitType=Line. Compression=None. ContentType=text/csv.

    Edit: MaxConcurrentTransform=1.

  • If it works sometimes, doesn't that strongly suggest that the failure is data related on your end? it's unlikely anyone can help unless you post an exact minimal csv file that you can consistently demonstrate is failing

preguntada hace 2 años129 visualizaciones
No hay respuestas

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas