Skip to content

Only one CPU used while parallel processing in Sagemaker

0

I am using a Sagemaker instance a 24xlarge to process a DataFrame of about 23 GB. I tried to use Pandarallel's parallel_apply() function on a Pandas DataFrame with 48 workers.
However, when I run the top command from the Terminal, I only see one process working with around 100% CPU utilization, but nothing happens. Eventually the favicon turns to the 'Done' icon, which usually means the code is done running, but CPU util stays high and the code cell still has the asterisk next to it.
I would expect multiple processes each with high CPU utilization, or one process with close to %9600 utilization. The same thing happens when I try the swifter package and modin which are also supposed to parallelize Pandas functions. What's going on here and how do I fix this problem?

asked 3 years ago725 views
1 Answer
0

Hello,

Thank you for using AWS Sagemaker.

Looking at the above query, I understand that you are using a Sagemaker instance to process a Dataframe and using Pandarallel's parallel_apply() function for parallel processing however only one CPU is being used while doing the same.

Please note that third-party libraries such as pandarallel and code review is outside the scope of AWS Support [1]. That said, I had a look at the library's GitHub page and found some relevant issues:

https://github.com/nalepae/pandarallel/issues/52

https://github.com/nalepae/pandarallel/issues/131

https://github.com/nalepae/pandarallel/issues/122

I recommend you to kindly go though above links and check if upgrading pandarallel version, or workarounds suggested can help in your case.

Finally, you could also consider using Spark clusters for dealing with very large data. You can create an EMR-backed Notebook instance following this guide:

https://aws.amazon.com/blogs/machine-learning/build-amazon-sagemaker-notebooks-backed-by-spark-in-amazon-emr/

To further understand the issue more in depth if it lies at Sagemaker end as I have limited visibility on your setup and code used, I'd recommend you to reach to AWS Premium Support by creating a support case[2] so that the engineer can investigate the root cause of the issue.

Reference: —————

[1] https://aws.amazon.com/premiumsupport/faqs/

[2] Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create

AWS
answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.