AWS Glue - Glue Jobs - Glue 2.0: Worker Types

0

Hey guys, I've got several questions regarding Glue 2.0 worker types for AWS Glue Jobs. I have gone through this documentation https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html as Im trying to figure out the type and amount of workers I need for my jobs, and the pricing as well, and I still have some questions left.

  1. How many DPUs is 1 Standard worker type equivalent to?
  2. How are resources being sliced by the second executor provided by a single standard worker?
  3. Whats the difference between having 2 executors (standard) and 1 executor (G.1X)? In what situation should I use one over the other?
  4. Assuming 1 Standard Worker = 1 DPU (Question 1 might answer this one too), am I being charge the same as a G.1X worker?
  5. Documentation mentions that, while using Glue 2.0, you need to specify a worker type and number of workers, does this mean that if I am using a standard worker type, all workers (and executors) are going to be active in the execution? Docs specify that while using Glue 1.0, you just need to provide a Max Number so Im assuming not all of workers are necessarily active here.

Really Appreciate the help guys, Regards

2 Answers
1

Hi,

  1. 1 Standard node = 1 DPU. ( https://docs.aws.amazon.com/glue/latest/dg/add-job.html )
  2. on one standard node you have 2 executor , each executor gets assigned half the cores and half the memory (actually less you have to consider the overhead)
  3. has mentioned in answer 2 - 2 executors will need to split cores and memory , so you run more tasks in parallel but each can only manage a smaller amount of data (remember that Spark tries to run all in memory) . when you have a single executor you reduce the parallel tasks but increase the data each task can manage. Starting from Glue 2.0 we advice to use only G.1X and G.2X worker types.
  4. yes
  5. Glue 2.0 does not use dynamic executor allocation, so you need to specify the exact number of Worker. Glue 1.0 was using Dynamic executor allocation so in theory the job would define how many executor were needed , so theoretically you are right., but unfortunately, I am not sure if that matched the billing process. (anyway, the current advice it to use Glue 2.0 and Glue 3.0 , so I would not focus on Glue 1.0 configuration).

With Glue 3.0 we have launched in preview autoscaling that will allow you to define again a maximum number of workers but use actual workload statistics to define the number of workers active (AND billed) at anytime.

hope this helps

AWS
EXPERT
answered 2 years ago
  • Thank you so much for your reply, this was really helpful, there were a few key concepts I was missing out, It is clear now. Also, I wasn't aware of Glue 3.0 autoscaling, I will dig a bit more into it. Thanks again (:

0

please find below for type of Worker node - The type of predefined worker that is allocated when a job runs. Accepts a value of Standard, G.1X, G.2X, or G.025X.

For the Standard worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker.

For the G.1X worker type, each worker maps to 1 DPU (4 vCPU, 16 GB of memory, 64 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs.

For the G.2X worker type, each worker maps to 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs.

For the G.025X worker type, each worker maps to 0.25 DPU (2 vCPU, 4 GB of memory, 64 GB disk), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for AWS Glue version 3.0 streaming jobs.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions