- Newest
- Most votes
- Most comments
Hello,
Thank you for using AWS Sagemaker.
Looking at the above query, I understand that you are using a Sagemaker instance to process a Dataframe and using Pandarallel's parallel_apply() function for parallel processing however only one CPU is being used while doing the same.
Please note that third-party libraries such as pandarallel and code review is outside the scope of AWS Support [1]. That said, I had a look at the library's GitHub page and found some relevant issues:
https://github.com/nalepae/pandarallel/issues/52
https://github.com/nalepae/pandarallel/issues/131
https://github.com/nalepae/pandarallel/issues/122
I recommend you to kindly go though above links and check if upgrading pandarallel version, or workarounds suggested can help in your case.
Finally, you could also consider using Spark clusters for dealing with very large data. You can create an EMR-backed Notebook instance following this guide:
To further understand the issue more in depth if it lies at Sagemaker end as I have limited visibility on your setup and code used, I'd recommend you to reach to AWS Premium Support by creating a support case[2] so that the engineer can investigate the root cause of the issue.
Reference: —————
[1] https://aws.amazon.com/premiumsupport/faqs/
[2] Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create
Relevant content
- asked 2 years ago
