EMR 7.0.0 pyspark kernel doesn't work in notebook

0

When using EMR 7.0.0 in EMR Serverless (have not tried EKS or EC2), after connecting to the application through a EMR Studio workspace, the pyspark kernel doesn't work in a notebook. It stays in status "unknown" and running any command hangs. The python3 kernel works fine.

Using EMR 6.15.0 the pyspark kernel works fine.

Anyone else having this problem? Not sure if it's a bug in EMR 7.0.0

tomups
已提問 4 個月前檢視次數 309 次
1 個回答
3
已接受的答案

Hello,

I tried EMR 7.0.0 Serverless application with Interactive workload that attached to workspace in Studio Notebook working fine without any issues. Pyspark & spark kernel are enabled as expected.

Please make sure that your EMR Studio user role has appropriate permissions for your selected compute type and interactive endpoint enabled for the application. Also check if custom AMI configured that causing issue ?

AWS
支援工程師
已回答 4 個月前
  • I had my EMR Serverless vCPU quota limited to 16 vCPU, so had to limit the EMR Serverless application size to that, and had to set spark.dynamicAllocation.enabled to false. With those constrained limits, the PySpark kernel of EMR 7.0.0 was not working, but the one from EMR 6.15.0 did.

    Now I got approved a quota increase and have 512 vCPU available. I increased the resources of the EMR Application and enabled the dynamic allocation, and the PySpark kernel works fine in 7.0.0 .

    So I think somehow 7.0.0 needs more resources to work properly? Could you try 7.0.0 without dynamicAllocation and limiting to less than 16 vCPU and see if the PySpark kernel stays in "unknown" state for you too?

  • @tomtastic, Good to hear that you are able to resolve the issue. I understand that default limit was 16vCPU and that driven slowness/hung in the notebook which makes sense depends on what you have executed with this capacity. Also, it is depends on how much has configured with pre-initialized capacity unless you dont specify it otherwise. You can certainly check the vCPU utilization for the given period of time in the service quota page.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南