1 Answer
- Newest
- Most votes
- Most comments
1
Hello,
AWS Glue ETL uses Apache spark in backend to process the data in memory. Job,Stages and tasks are the terminologies used in spark distributed processing engine.
Basically Spark stages are the physical unit of execution for the computation of multiple tasks. There are 2 kinds of transformations which take place:
- Narrow Transformations: These are transformations that do not require the process of shuffling. These actions can be executed in a single stage.
Example: map() and filter()
- Wide Transformations: These are transformations that require shuffling across various partitions. Hence it requires different stages to be created for communication across different partitions.
Example: ReduceByKey
To get more understanding about Spark internals you can refer below documentation or other spark resources.[1]
answered 2 years ago
Relevant content
- Accepted Answerasked 4 years ago
- asked a year ago
- asked 5 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated a year ago