How to choose which Spark kernel to use in SageMaker Studio?

0

Available Amazon SageMaker Kernels include the following two Spark kernels:

  • PySpark (SparkMagic) with Python 3.7
  • Spark (SparkMagic) with Python 3.7
  • Spark Analytics 1.0
  • Spark Analytics 2.0

And at re:Invent 2022 there was an announcement that "SageMaker Studio now supports Glue Interactive Sessions." "The built-in Glue PySpark or Glue Spark kernel for your Studio notebook to initialize interactive, serverless Spark sessions."

It seems like the benefits of using one of the Glue Spark kernels are that you can "quickly browse the Glue data catalog, run large queries, and interactively analyze and prepare data using Spark, right in your Studio notebook." But can't you already do all that with the existing two SageMaker kernels?

In other words, how do you choose whether to use one of the existing two SparkMagic kernels in SageMaker Studio notebooks or to use this new Glue Interactive Sessions feature?

  • I just looked up SparkMagic and looks like it's "a set of tools for interactively working with remote Spark clusters in Jupyter notebooks" -- meaning it's for executing Spark on EMR from SageMaker? And this announcement now makes it possible to do the same, but with Glue?

AWS
preguntada hace un año604 visualizaciones
1 Respuesta
0

The difference is that with SparkMagic you would need to provide a Spark cluster and link to it using SparkMagic configuration.
With Glue Interactive Sessions all that time consuming work is taken case for you, you can easily create and destroy Spark clusters as you need.

profile pictureAWS
EXPERTO
respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas