Determining the "right" instance type running Jupyter notebook in Sagemaker when reading/writing a huge parquet file?

0

I am unclear as o how to determine the "right" instance type running Jupyter notebook in Sagemaker. When reading/writing a small size parquet file, no problem; but when I try to read/write a huge parquet file, the program stops and gives an error, "Job aborted due to stage failure: Task 21 in stage 33.0 failed 1 times, most recent failure: Lost task 21.0 in stage 33.0 (TID 1755, localhost, executor driver" I would appreciate any insights please... thanks.

preguntada hace 2 años362 visualizaciones
1 Respuesta
0

For notebook instance it's mostly trial-and-error, at least for now. Once your model is ready to be deployed, there is the SageMaker Inference Recommender that can do automated load testing and give you recommendation on the instance size.

It's hard to give a recommendation on the notebook instance because you might test a 100MB dataset today, but choose to go with a 500GB dataset tomorrow, so the recommendations are no longer valid.

You might want to try experimenting with a smaller dataset sampled from the original big dataset, once you are confident with the model training code, use distributed training to run it on the complete big dataset.

AWS
S Lyu
respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas