Determining the "right" instance type running Jupyter notebook in Sagemaker when reading/writing a huge parquet file?

Question

I am unclear as o how to determine the "right" instance type running Jupyter notebook in Sagemaker. When reading/writing a small size parquet file, no problem; but when I try to read/write a huge parquet file,  the program stops and gives an error, "Job aborted due to stage failure: Task 21 in stage 33.0 failed 1 times, most recent failure: Lost task 21.0 in stage 33.0 (TID 1755, localhost, executor driver"
I would appreciate any insights please... thanks.

Answer

For notebook instance it's mostly trial-and-error, at least for now. Once your model is ready to be deployed, there is the [SageMaker Inference Recommender](https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender.html) that can do automated load testing and give you recommendation on the instance size.

It's hard to give a recommendation on the notebook instance because you might test a 100MB dataset today, but choose to go with a 500GB dataset tomorrow, so the recommendations are no longer valid.

You might want to try experimenting with a smaller dataset sampled from the original big dataset, once you are confident with the model training code, use [distributed training](https://aws.amazon.com/sagemaker/distributed-training/) to run it on the complete big dataset.

Determining the "right" instance type running Jupyter notebook in Sagemaker when reading/writing a huge parquet file?

相關內容