Determining the "right" instance type running Jupyter notebook in Sagemaker when reading/writing a huge parquet file?

0

I am unclear as o how to determine the "right" instance type running Jupyter notebook in Sagemaker. When reading/writing a small size parquet file, no problem; but when I try to read/write a huge parquet file, the program stops and gives an error, "Job aborted due to stage failure: Task 21 in stage 33.0 failed 1 times, most recent failure: Lost task 21.0 in stage 33.0 (TID 1755, localhost, executor driver" I would appreciate any insights please... thanks.

已提問 2 年前檢視次數 362 次
1 個回答
0

For notebook instance it's mostly trial-and-error, at least for now. Once your model is ready to be deployed, there is the SageMaker Inference Recommender that can do automated load testing and give you recommendation on the instance size.

It's hard to give a recommendation on the notebook instance because you might test a 100MB dataset today, but choose to go with a 500GB dataset tomorrow, so the recommendations are no longer valid.

You might want to try experimenting with a smaller dataset sampled from the original big dataset, once you are confident with the model training code, use distributed training to run it on the complete big dataset.

AWS
S Lyu
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南