http.client.RemoteDisconnected at the end of a SageMaker training job

0

Hello. I'm running SageMaker training jobs through a library called ZenML. The library is just there as an abstraction layer, so that when I return the artifacts gets automatically saved to S3. The library works, no problem from that side, but when moving bigger files SageMaker fails to upload to S3.

In particular, after the long training is done, and I get charged the full price, I get:

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) If needed I can provide the full log, but it fails at requests/adapters.py.

By looking it up online, it looks like the connection was actively terminated from the server, or that ther emight be networks error, even though artifacts do not leave AWS but they move from SageMaker to S3

Nessuna risposta

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande