By using AWS re:Post, you agree to the Terms of Use

AWS Glue - Glue Jobs - Glue 2.0: Worker Types

0

Hey guys, I've got several questions regarding Glue 2.0 worker types for AWS Glue Jobs. I have gone through this documentation https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html as Im trying to figure out the type and amount of workers I need for my jobs, and the pricing as well, and I still have some questions left.

  1. How many DPUs is 1 Standard worker type equivalent to?
  2. How are resources being sliced by the second executor provided by a single standard worker?
  3. Whats the difference between having 2 executors (standard) and 1 executor (G.1X)? In what situation should I use one over the other?
  4. Assuming 1 Standard Worker = 1 DPU (Question 1 might answer this one too), am I being charge the same as a G.1X worker?
  5. Documentation mentions that, while using Glue 2.0, you need to specify a worker type and number of workers, does this mean that if I am using a standard worker type, all workers (and executors) are going to be active in the execution? Docs specify that while using Glue 1.0, you just need to provide a Max Number so Im assuming not all of workers are necessarily active here.

Really Appreciate the help guys, Regards

1 Answer
1

Hi,

  1. 1 Standard node = 1 DPU. ( https://docs.aws.amazon.com/glue/latest/dg/add-job.html )
  2. on one standard node you have 2 executor , each executor gets assigned half the cores and half the memory (actually less you have to consider the overhead)
  3. has mentioned in answer 2 - 2 executors will need to split cores and memory , so you run more tasks in parallel but each can only manage a smaller amount of data (remember that Spark tries to run all in memory) . when you have a single executor you reduce the parallel tasks but increase the data each task can manage. Starting from Glue 2.0 we advice to use only G.1X and G.2X worker types.
  4. yes
  5. Glue 2.0 does not use dynamic executor allocation, so you need to specify the exact number of Worker. Glue 1.0 was using Dynamic executor allocation so in theory the job would define how many executor were needed , so theoretically you are right., but unfortunately, I am not sure if that matched the billing process. (anyway, the current advice it to use Glue 2.0 and Glue 3.0 , so I would not focus on Glue 1.0 configuration).

With Glue 3.0 we have launched in preview autoscaling that will allow you to define again a maximum number of workers but use actual workload statistics to define the number of workers active (AND billed) at anytime.

hope this helps

EXPERT
answered 7 months ago
  • Thank you so much for your reply, this was really helpful, there were a few key concepts I was missing out, It is clear now. Also, I wasn't aware of Glue 3.0 autoscaling, I will dig a bit more into it. Thanks again (:

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions