How to choose which Spark kernel to use in SageMaker Studio?


Available Amazon SageMaker Kernels include the following two Spark kernels:

  • PySpark (SparkMagic) with Python 3.7
  • Spark (SparkMagic) with Python 3.7
  • Spark Analytics 1.0
  • Spark Analytics 2.0

And at re:Invent 2022 there was an announcement that "SageMaker Studio now supports Glue Interactive Sessions." "The built-in Glue PySpark or Glue Spark kernel for your Studio notebook to initialize interactive, serverless Spark sessions."

It seems like the benefits of using one of the Glue Spark kernels are that you can "quickly browse the Glue data catalog, run large queries, and interactively analyze and prepare data using Spark, right in your Studio notebook." But can't you already do all that with the existing two SageMaker kernels?

In other words, how do you choose whether to use one of the existing two SparkMagic kernels in SageMaker Studio notebooks or to use this new Glue Interactive Sessions feature?

  • I just looked up SparkMagic and looks like it's "a set of tools for interactively working with remote Spark clusters in Jupyter notebooks" -- meaning it's for executing Spark on EMR from SageMaker? And this announcement now makes it possible to do the same, but with Glue?

asked 21 days ago25 views
1 Answer

The difference is that with SparkMagic you would need to provide a Spark cluster and link to it using SparkMagic configuration.
With Glue Interactive Sessions all that time consuming work is taken case for you, you can easily create and destroy Spark clusters as you need.

profile picture
answered 18 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions