PySpark on Sagemaker - force stop of the execution for all cells

0

Hi, I am using Jupyter Notebook on SageMaker with PySpark kernel. I want to implement some data checks so that the processing of the notebook is stopped if some conditions is meet. For Python kernel one can simply use raise statement as in the example below. If the notebook was run using "Run All" button then the code below would stop not only the execution of the error cell, but it would not proceed to the next cells.

if type(age) is not int:
    raise TypeError("Age must be an integer")
elif age < 0:
    raise ValueError("Sorry you can't be born in the future")

However, if we use the same piece of code with PySpark kernel, execution of the error cell would be stopped but the notebook would still proceed with the execution of the next cells (as "Run All" was used). So in fact calculations would be proceeded even if age is not integer or is < 0.

How to force PySpark kernel to stop executing the entire notebook on error, just as it is the case for Python kernel?

Adding a print screen with output notebook after "Run All": Enter image description here

  • Are you using notebook instance to connect to an EMR cluster and process .

질문됨 2년 전263회 조회
1개 답변
0

Hello

The functionality you have explained here is available when using the PySpark Kernel within a SageMaker Jupyter Notebook.

When raising an error within a cell, if that error is flagged in the run time, the notebook will not continue to run the following cells even when using the Run All Option.

When an error is met, the notebook will output the information of this error and will stop executing.

I have tested this in a SageMaker Jupyter Notebook Instance with a SparkMagic PySpark Kernel and found the above to be true.

AWS
Caryn_S
답변함 2년 전
  • Unfortunately, it looks different for me - I added print screen of my notebook in the original question post.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠