- Le plus récent
- Le plus de votes
- La plupart des commentaires
Hello,
Thank you for writing on re:Post.
I see that you want to know how you can improve the performance of your current EMR cluster running large datasets.
First of all, I would request you to tune your Spark memory parameters by using the below AWS Best Practices Guide - [+] https://aws.github.io/aws-emr-best-practices/docs/bestpractices/Applications/Spark/troubleshooting_and_tuning/ This will assist you in tuning the driver and memory parameters according to the instance type being used. As default executor memory of 8g looks less.
Secondly, I recommend you to read about using maximizeResourceAllocation. This will allow the executors to use the maximum resources possible on each node in a cluster. [+] https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#emr-spark-maximizeresourceallocation
Next, I ask you to check the Cloudwatch Metrics of the Cluster when the count or show is running to check if the resources are falling short causing the delay. Especially metrics like - ContainerPending, YARNMemoryAvailablePercentage. if they are showing high load, then you may need to increase the maximum cluster size in the Managed Scaling settings. Also the limits set in the Managed scaling is in units which is not as same as number of nodes. In InstanceFleet, every node has a weight which can be provided during cluster creation. Please check that and set the scaling limits accordingly.
I hope I was able to address your query. Thanks and have a great day ahead!
Contenus pertinents
- demandé il y a 7 mois
- demandé il y a un an
- demandé il y a un an
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a 2 ans
- AWS OFFICIELA mis à jour il y a 2 ans