http.client.RemoteDisconnected at the end of a SageMaker training job

0

Hello. I'm running SageMaker training jobs through a library called ZenML. The library is just there as an abstraction layer, so that when I return the artifacts gets automatically saved to S3. The library works, no problem from that side, but when moving bigger files SageMaker fails to upload to S3.

In particular, after the long training is done, and I get charged the full price, I get:

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) If needed I can provide the full log, but it fails at requests/adapters.py.

By looking it up online, it looks like the connection was actively terminated from the server, or that ther emight be networks error, even though artifacts do not leave AWS but they move from SageMaker to S3

Aucune réponse

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions