AWS Glue 3.0 Docker Image - can we increase SPARK configurations ?
My Spark code is running extremely slow and either timing out or not running at all . I though have just 4 records in the source S3 bucket that I am trying to process. Can anyone suggest if we can increase the SPARK power in the docker to make it run faster ?
I am using the AWS Glue 3 Docker instance to set up my local environment. I am using a notebook to submit the jobs and being a newbie , do not know how to change the spark-submit configuration in this env
Hello,
If you are using a Jupyter notebook with your Glue docker image, then you can use Spark magic command %%configure to set the Spark's driver and executor memory/vcores depending on your system/computer.
You can get a list of available spark magic commands by simply running %help in a cell
The command looks something like below
%%configure
{
"driverMemory":"2000M",
"executorMemory":"2000M"
}
You can get list of configurable Spark parameters from here
Relevant questions
My Lightsail Wordpress site is running extremely slow for no apparent reason
asked 4 months agoAWS Glue 3.0 Docker Image - can we increase SPARK configurations ?
asked a month agoWhat's the best way to filter out duplicated records in a Glue ETL Job with bookmarking enabled?
asked 6 months agoRunning concurrent sessions from SageMaker notebooks on Glue Dev Endpoints.
Accepted Answerasked 2 years agoWhat is the most cost efficient and fastest way to start GLUE ETL development
Accepted Answerasked a month agoHow do I access the Spark UI?
asked 3 years agoReading Aurora Postgress Table with Spark SQL on EMR
Accepted Answerasked 4 years agoNeed AWS Glue to store bad records/ records with error when reading Mongo db data to a S3 path and process the rest of the data.
asked 2 months agoProblem Sagemaker and Spark
asked 4 years agoCode running slow on Sagemaker notebook instance for the first time it runs
asked 2 years ago
This did not work for me