How to choose which Spark kernel to use in SageMaker Studio?

0

Available Amazon SageMaker Kernels include the following two Spark kernels:

  • PySpark (SparkMagic) with Python 3.7
  • Spark (SparkMagic) with Python 3.7
  • Spark Analytics 1.0
  • Spark Analytics 2.0

And at re:Invent 2022 there was an announcement that "SageMaker Studio now supports Glue Interactive Sessions." "The built-in Glue PySpark or Glue Spark kernel for your Studio notebook to initialize interactive, serverless Spark sessions."

It seems like the benefits of using one of the Glue Spark kernels are that you can "quickly browse the Glue data catalog, run large queries, and interactively analyze and prepare data using Spark, right in your Studio notebook." But can't you already do all that with the existing two SageMaker kernels?

In other words, how do you choose whether to use one of the existing two SparkMagic kernels in SageMaker Studio notebooks or to use this new Glue Interactive Sessions feature?

  • I just looked up SparkMagic and looks like it's "a set of tools for interactively working with remote Spark clusters in Jupyter notebooks" -- meaning it's for executing Spark on EMR from SageMaker? And this announcement now makes it possible to do the same, but with Glue?

AWS
已提問 1 年前檢視次數 611 次
1 個回答
0

The difference is that with SparkMagic you would need to provide a Spark cluster and link to it using SparkMagic configuration.
With Glue Interactive Sessions all that time consuming work is taken case for you, you can easily create and destroy Spark clusters as you need.

profile pictureAWS
專家
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南