EMR 7.0.0 pyspark kernel doesn't work in notebook

0

When using EMR 7.0.0 in EMR Serverless (have not tried EKS or EC2), after connecting to the application through a EMR Studio workspace, the pyspark kernel doesn't work in a notebook. It stays in status "unknown" and running any command hangs. The python3 kernel works fine.

Using EMR 6.15.0 the pyspark kernel works fine.

Anyone else having this problem? Not sure if it's a bug in EMR 7.0.0

tomups
asked 4 months ago274 views
1 Answer
3
Accepted Answer

Hello,

I tried EMR 7.0.0 Serverless application with Interactive workload that attached to workspace in Studio Notebook working fine without any issues. Pyspark & spark kernel are enabled as expected.

Please make sure that your EMR Studio user role has appropriate permissions for your selected compute type and interactive endpoint enabled for the application. Also check if custom AMI configured that causing issue ?

AWS
SUPPORT ENGINEER
answered 4 months ago
  • I had my EMR Serverless vCPU quota limited to 16 vCPU, so had to limit the EMR Serverless application size to that, and had to set spark.dynamicAllocation.enabled to false. With those constrained limits, the PySpark kernel of EMR 7.0.0 was not working, but the one from EMR 6.15.0 did.

    Now I got approved a quota increase and have 512 vCPU available. I increased the resources of the EMR Application and enabled the dynamic allocation, and the PySpark kernel works fine in 7.0.0 .

    So I think somehow 7.0.0 needs more resources to work properly? Could you try 7.0.0 without dynamicAllocation and limiting to less than 16 vCPU and see if the PySpark kernel stays in "unknown" state for you too?

  • @tomtastic, Good to hear that you are able to resolve the issue. I understand that default limit was 16vCPU and that driven slowness/hung in the notebook which makes sense depends on what you have executed with this capacity. Also, it is depends on how much has configured with pre-initialized capacity unless you dont specify it otherwise. You can certainly check the vCPU utilization for the given period of time in the service quota page.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions