- Newest
- Most votes
- Most comments
Hi,
- 1 Standard node = 1 DPU. ( https://docs.aws.amazon.com/glue/latest/dg/add-job.html )
- on one standard node you have 2 executor , each executor gets assigned half the cores and half the memory (actually less you have to consider the overhead)
- has mentioned in answer 2 - 2 executors will need to split cores and memory , so you run more tasks in parallel but each can only manage a smaller amount of data (remember that Spark tries to run all in memory) . when you have a single executor you reduce the parallel tasks but increase the data each task can manage. Starting from Glue 2.0 we advice to use only G.1X and G.2X worker types.
- yes
- Glue 2.0 does not use dynamic executor allocation, so you need to specify the exact number of Worker. Glue 1.0 was using Dynamic executor allocation so in theory the job would define how many executor were needed , so theoretically you are right., but unfortunately, I am not sure if that matched the billing process. (anyway, the current advice it to use Glue 2.0 and Glue 3.0 , so I would not focus on Glue 1.0 configuration).
With Glue 3.0 we have launched in preview autoscaling that will allow you to define again a maximum number of workers but use actual workload statistics to define the number of workers active (AND billed) at anytime.
hope this helps
please find below for type of Worker node - The type of predefined worker that is allocated when a job runs. Accepts a value of Standard, G.1X, G.2X, or G.025X.
For the Standard worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker.
For the G.1X worker type, each worker maps to 1 DPU (4 vCPU, 16 GB of memory, 64 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs.
For the G.2X worker type, each worker maps to 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs.
For the G.025X worker type, each worker maps to 0.25 DPU (2 vCPU, 4 GB of memory, 64 GB disk), and provides 1 executor per worker. We recommend this worker type for low volume streaming jobs. This worker type is only available for AWS Glue version 3.0 streaming jobs.
Relevant content
- asked 2 years ago
- asked 2 years ago
- Accepted Answerasked a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 8 months ago
Thank you so much for your reply, this was really helpful, there were a few key concepts I was missing out, It is clear now. Also, I wasn't aware of Glue 3.0 autoscaling, I will dig a bit more into it. Thanks again (: