EMR 7.0.0 pyspark kernel doesn't work in notebook

0

When using EMR 7.0.0 in EMR Serverless (have not tried EKS or EC2), after connecting to the application through a EMR Studio workspace, the pyspark kernel doesn't work in a notebook. It stays in status "unknown" and running any command hangs. The python3 kernel works fine.

Using EMR 6.15.0 the pyspark kernel works fine.

Anyone else having this problem? Not sure if it's a bug in EMR 7.0.0

tomups
질문됨 4달 전309회 조회
1개 답변
3
수락된 답변

Hello,

I tried EMR 7.0.0 Serverless application with Interactive workload that attached to workspace in Studio Notebook working fine without any issues. Pyspark & spark kernel are enabled as expected.

Please make sure that your EMR Studio user role has appropriate permissions for your selected compute type and interactive endpoint enabled for the application. Also check if custom AMI configured that causing issue ?

AWS
지원 엔지니어
답변함 4달 전
  • I had my EMR Serverless vCPU quota limited to 16 vCPU, so had to limit the EMR Serverless application size to that, and had to set spark.dynamicAllocation.enabled to false. With those constrained limits, the PySpark kernel of EMR 7.0.0 was not working, but the one from EMR 6.15.0 did.

    Now I got approved a quota increase and have 512 vCPU available. I increased the resources of the EMR Application and enabled the dynamic allocation, and the PySpark kernel works fine in 7.0.0 .

    So I think somehow 7.0.0 needs more resources to work properly? Could you try 7.0.0 without dynamicAllocation and limiting to less than 16 vCPU and see if the PySpark kernel stays in "unknown" state for you too?

  • @tomtastic, Good to hear that you are able to resolve the issue. I understand that default limit was 16vCPU and that driven slowness/hung in the notebook which makes sense depends on what you have executed with this capacity. Also, it is depends on how much has configured with pre-initialized capacity unless you dont specify it otherwise. You can certainly check the vCPU utilization for the given period of time in the service quota page.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠