Why can't I connect my EMR notebook to the cluster?

4 minute read
0

I can't connect my Amazon EMR notebook to my EMR cluster.

Short description

When connecting an EMR notebook to the EMR cluster, you might receive errors similar to the following:

  • Unable to attach to cluster j-XXXXXXXXXXX. Reason: Attaching the workspace(notebook) failed. Internal error.
  • Notebook is not supported in the chosen Availability Zone. Please try using a cluster in another availability zone.
  • Attaching the workspace(notebook) failed. Invalid configuration.
  • Workspace(notebook) is stopped. Cluster j-XXXXXXXXXX does not have JupyterEnterpriseGateway application installed. Please retry with another cluster.
  • Workspace errors: Not able to attached EMR notebook to running cluster. Error starting kernel. HTTP 403: Forbidden (Workspace is not attached to cluster. Click 'Ok' to continue.)

Resolution

Verify that the attached cluster is compatible and meets all cluster requirements

Cluster requirements for EMR notebooks are as follows:

1.    Only clusters created using Amazon EMR release version 5.18.0 and later are supported.

2.    Clusters created using Amazon Elastic Compute Cloud (Amazon EC2) instances with AMD EPYC processors aren't supported. For example, m5a.* and r5a.* instance types aren't supported.

3.    EMR notebooks work only with clusters created with the VisibleToAllUsers variable set to true. VisibleToAllUsers is set to true by default.

4.    The cluster must be launched within an EC2 Amazon Virtual Private Cloud (Amazon VPC). Public and private subnets are supported.

5.    EMR notebooks currently support Apache Spark clusters only.

6.    For EMR release versions 5.32.0 and later, or 6.2.0 and later, your cluster must be running the Jupyter Enterprise Gateway application.

7.    Clusters using Kerberos authentication aren't supported.

8.    Clusters integrated with AWS Lake Formation support the installation of notebook-scoped libraries only. Installing kernels and libraries on the cluster isn't supported.

9.    Clusters with multiple primary nodes aren't supported.

10.    Clusters using Amazon EC2 instances based on AWS Graviton2 aren't supported.

For more information, see Cluster requirements.

Error: Unable to attach to cluster j-XXXXXXXXXXX. Reason: Attaching the workspace(notebook) failed. Internal error

This occurs on EMR clusters with Apache Livy impersonation turned on. This means the livy.impersonation.enabled variable is set to true. On Amazon EMR 6.4.0 Livy impersonation is set to true by default. The EMR notebooks feature with Livy user impersonation turned off also have HttpFS turned off by default. This means that the EMR notebook can't connect to clusters that have Livy impersonation turned on. For more information, see Amazon EMR release 6.4.0.

To avoid this problem, do the following:

You can use any older version or newer version of EMR 6.4.0 where the hadoop-httpfs service is running.

-or-

Restart the hadoop-httpfs service on cluster by doing the following:

1.    Use SSH to connect to the EMR primary node.

2.    Run the following command to start the hadoop-httpfs service:

sudo systemctl start hadoop-httpfs

Or, you can start the hadoop-httpfs service using an EMR step:

==========
JAR location: command-runner.jar
Main class: None
Arguments: bash -c "sudo systemctl start hadoop-httpfs"
Action on failure: Continue
==========

Run the following command to verify the status of HttpFS:

$ sudo systemctl status hadoop-httpfs
  hadoop-httpfs.service - Hadoop httpfs
   Loaded: loaded (/etc/systemd/system/hadoop-httpfs.service; disabled; vendor preset: disabled)
   Active: active (running)...

3.    Reattach the EMR cluster.

Error: Workspace errors

The following are common workspace errors when trying to connect your EMR cluster to an EMR notebook:

  • Not able to attached EMR notebook to running cluster.
  • Error Starting Kernel.
  • HTTP 403: Forbidden (Workspace is not attached to cluster. Choose 'Ok' to continue.)

These errors occur because the AWS root account isn't authorized to attach EMR notebooks to EMR clusters. The root user is considered an unauthorized user to start kernels. If the value of KERNEL_USERNAME appears in the unauthorized_users list, the request to connect fails. For more information, see Security features.

To avoid workspace errors, create an AWS Identity and Access Manager (AWS IAM) user, and then attach the cluster to the notebook. For more information, see Creating an IAM user in your AWS account.


AWS OFFICIAL
AWS OFFICIALUpdated a year ago