By using AWS re:Post, you agree to the Terms of Use

How to set spark configuration parameters in PySparkProcessor() in sagemaker processing job?

0

Hi folks, I'm trying to set the spark executor instances & memory, driver memory and switch of dynamic allocation. What is the correct way to do it?

1 Answers
1

Hi! You can achieve this by passing a "configuration" dictionary to the PySparkProcessor. Have a look at the example below to see exactly how to achieve this: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_processing.html#configuration-override

happy coding

answered 2 months ago
EXPERT
reviewed 2 months ago
  • HI! Thanks for the prompt response. I tried the approach above and here is how my configuration looks configuration = [{ "Classification": "spark-defaults", "Properties": {"spark.executor.memory":"45g", "spark.executor.instance":"45","spark.executor.cores":"6","spark.driver.memory":"30g", "spark.dynamicAllocation.enabled":"false"}, }] and couldn't update the executor instances, i.e., spark.executor.instances. To confirm passing values via "spark-deafults" is equivalent to “--conf” on an EMR spark-submit job.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions