Determining the "right" instance type running Jupyter notebook in Sagemaker when reading/writing a huge parquet file?

0

I am unclear as o how to determine the "right" instance type running Jupyter notebook in Sagemaker. When reading/writing a small size parquet file, no problem; but when I try to read/write a huge parquet file, the program stops and gives an error, "Job aborted due to stage failure: Task 21 in stage 33.0 failed 1 times, most recent failure: Lost task 21.0 in stage 33.0 (TID 1755, localhost, executor driver" I would appreciate any insights please... thanks.

已提问 2 年前362 查看次数
1 回答
0

For notebook instance it's mostly trial-and-error, at least for now. Once your model is ready to be deployed, there is the SageMaker Inference Recommender that can do automated load testing and give you recommendation on the instance size.

It's hard to give a recommendation on the notebook instance because you might test a 100MB dataset today, but choose to go with a 500GB dataset tomorrow, so the recommendations are no longer valid.

You might want to try experimenting with a smaller dataset sampled from the original big dataset, once you are confident with the model training code, use distributed training to run it on the complete big dataset.

AWS
S Lyu
已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则