PySpark on Sagemaker - force stop of the execution for all cells

Question

Hi,
 I am using Jupyter Notebook on SageMaker with PySpark kernel. I want to implement some data checks so that the processing of the notebook is stopped if some conditions is meet. For Python kernel one can simply use raise statement as in the example below. If the notebook was run using "Run All" button then the code below would stop not only the execution of the error cell, but it would **not proceed to the next cells**. 
```
if type(age) is not int:
    raise TypeError("Age must be an integer")
elif age < 0:
    raise ValueError("Sorry you can't be born in the future")
```
However, if we use the same piece of code with PySpark kernel, execution of the error cell would be stopped but the notebook would still proceed with the execution of the next cells (as "Run All" was used). So in fact calculations would be proceeded even if age is not integer or is < 0.

How to force PySpark kernel to stop executing the entire notebook on error, just as it is the case for Python kernel?

Adding a print screen with output notebook after "Run All": 
![Enter image description here](/media/postImages/original/IM4NPOObXJTXWuNvt7ZhL8aw)

Answer

Hello

The functionality you have explained here is available when using the PySpark Kernel within a SageMaker Jupyter Notebook.

When raising an error within a cell, if that error is flagged in the run time, the notebook will not continue to run the following cells even when using the Run All Option.

When an error is met, the notebook will output the information of this error and will stop executing.

I have tested this in a SageMaker Jupyter Notebook Instance with a SparkMagic PySpark Kernel and found the above to be true.

PySpark on Sagemaker - force stop of the execution for all cells

相關內容