http.client.RemoteDisconnected at the end of a SageMaker training job

0

Hello. I'm running SageMaker training jobs through a library called ZenML. The library is just there as an abstraction layer, so that when I return the artifacts gets automatically saved to S3. The library works, no problem from that side, but when moving bigger files SageMaker fails to upload to S3.

In particular, after the long training is done, and I get charged the full price, I get:

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) If needed I can provide the full log, but it fails at requests/adapters.py.

By looking it up online, it looks like the connection was actively terminated from the server, or that ther emight be networks error, even though artifacts do not leave AWS but they move from SageMaker to S3

No hay respuestas

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas