By using AWS re:Post, you agree to the Terms of Use

Determining the "right" instance type running Jupyter notebook in Sagemaker when reading/writing a huge parquet file?


I am unclear as o how to determine the "right" instance type running Jupyter notebook in Sagemaker. When reading/writing a small size parquet file, no problem; but when I try to read/write a huge parquet file, the program stops and gives an error, "Job aborted due to stage failure: Task 21 in stage 33.0 failed 1 times, most recent failure: Lost task 21.0 in stage 33.0 (TID 1755, localhost, executor driver" I would appreciate any insights please... thanks.

1 Answer

For notebook instance it's mostly trial-and-error, at least for now. Once your model is ready to be deployed, there is the SageMaker Inference Recommender that can do automated load testing and give you recommendation on the instance size.

It's hard to give a recommendation on the notebook instance because you might test a 100MB dataset today, but choose to go with a 500GB dataset tomorrow, so the recommendations are no longer valid.

You might want to try experimenting with a smaller dataset sampled from the original big dataset, once you are confident with the model training code, use distributed training to run it on the complete big dataset.

answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions