PySpark on Sagemaker - force stop of the execution for all cells

0

Hi, I am using Jupyter Notebook on SageMaker with PySpark kernel. I want to implement some data checks so that the processing of the notebook is stopped if some conditions is meet. For Python kernel one can simply use raise statement as in the example below. If the notebook was run using "Run All" button then the code below would stop not only the execution of the error cell, but it would not proceed to the next cells.

if type(age) is not int:
    raise TypeError("Age must be an integer")
elif age < 0:
    raise ValueError("Sorry you can't be born in the future")

However, if we use the same piece of code with PySpark kernel, execution of the error cell would be stopped but the notebook would still proceed with the execution of the next cells (as "Run All" was used). So in fact calculations would be proceeded even if age is not integer or is < 0.

How to force PySpark kernel to stop executing the entire notebook on error, just as it is the case for Python kernel?

Adding a print screen with output notebook after "Run All": Enter image description here

  • Are you using notebook instance to connect to an EMR cluster and process .

已提問 2 年前檢視次數 263 次
1 個回答
0

Hello

The functionality you have explained here is available when using the PySpark Kernel within a SageMaker Jupyter Notebook.

When raising an error within a cell, if that error is flagged in the run time, the notebook will not continue to run the following cells even when using the Run All Option.

When an error is met, the notebook will output the information of this error and will stop executing.

I have tested this in a SageMaker Jupyter Notebook Instance with a SparkMagic PySpark Kernel and found the above to be true.

AWS
Caryn_S
已回答 2 年前
  • Unfortunately, it looks different for me - I added print screen of my notebook in the original question post.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南