How to set spark configuration parameters in PySparkProcessor() in sagemaker processing job?

0

Hi folks, I'm trying to set the spark executor instances & memory, driver memory and switch of dynamic allocation. What is the correct way to do it?

1 回答
1

Hi! You can achieve this by passing a "configuration" dictionary to the PySparkProcessor. Have a look at the example below to see exactly how to achieve this: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_processing.html#configuration-override

happy coding

AWS
已回答 2 年前
profile pictureAWS
专家
Tasio
已审核 2 年前
  • HI! Thanks for the prompt response. I tried the approach above and here is how my configuration looks configuration = [{ "Classification": "spark-defaults", "Properties": {"spark.executor.memory":"45g", "spark.executor.instance":"45","spark.executor.cores":"6","spark.driver.memory":"30g", "spark.dynamicAllocation.enabled":"false"}, }] and couldn't update the executor instances, i.e., spark.executor.instances. To confirm passing values via "spark-deafults" is equivalent to “--conf” on an EMR spark-submit job.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则