- Newest
- Most votes
- Most comments
Hello,
I understand that you are requesting for recommendations to create EMR cluster in order to run your spark jobs. Please go through the below suggestions and recommendations for the same,
➤ Firstly, Please be informed that creation of an EMR cluster can only be done in the following ways[1]
- An EMR cluster with only a single primary node.
- A Multi primary cluster with a fixed number of 3 primary nodes.
➤ By looking at the overall size '2GB' of the jobs that you mentioned, I can infer that the instance type 'm5.xlarge' can be sufficient for the use case.[2]
➤ Further, I would also recommend enabling the 'scaling' option for your EMR cluster which helps in resizing of your cluster as per your workload. Please refer the below document in order to gather more insights about scaling. [3]
However, kindly note that without the visibility into the computation of your job run, it's not possible to determine the optimised compatibility of any specific instance type for your cluster. Hence, it is recommended to test your use case in a test environment before implementing directly in your production.
➤ Additionally, I suggest you to go through the below document for EMR best practices which helps you to determine right infrastructure for your workloads.[4]
References :
[2] https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances-guidelines.html
[3] https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-scale-on-demand.html
[4] https://aws.github.io/aws-emr-best-practices/docs/bestpractices/Applications/Spark/best_practices/
Relevant content
- asked 5 days ago
- asked 10 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago