How do I resolve the error ConnectTimeoutError when connecting to an Amazon EMR cluster from my Amazon SageMaker Studio notebook?

4 minute read
0

I want to troubleshoot the connection timeout error that I receive when trying to connect to an Amazon EMR cluster from my Amazon SageMaker Studio notebook.

Resolution

The connection timeout error might occur due to network configuration issues related to the following for SageMaker Studio or the Amazon EMR cluster:

  • Amazon Virtual Private Cloud (Amazon VPC)
  • Subnets
  • Security groups

Be sure that the following prerequisites for the connection are met:

  • SageMaker Studio is launched in VPC only mode.
  • The Amazon EMR cluster and SageMaker Studio notebook are launched in the same VPC. If they are in different VPCs, then they are connected through a VPC peering connection. If VPC peering is used for your cross-Region use case, then you must configure the config file /etc/sparkmagic/config.json manually. This is because the Amazon EMR cluster's discovery functionality doesn't support cross-Region connection currently. For more information, see Build Amazon SageMaker notebooks backed by Spark in Amazon EMR.
  • The Amazon EMR cluster is launched with Apache Spark and Apache Livy applications installed.

To resolve connection timeout errors, do the following:

1.    Perform the following checks:

  • Be sure that the security groups or network access control lists (ACLs) are configured correctly to allow traffic on port 8998. Perform this check for both the SageMaker Studio notebook and Amazon EMR cluster.
  • Be sure that the security group for SageMaker Studio has an inbound rule to allow NFS traffic over port 2049 between the domain and Amazon Elastic File System (Amazon EFS) volume.
  • Be sure that the EMR cluster master node security group has an inbound rule for Custom TCP over port 8998. This rule can either specify the Studio's security group or a CIDR that includes the Studio's subnet.
    Note: If you're using VPC peering, then see Update your security groups to reference peer security groups. This documentation provides information on how to specify a security group from a cross-account VPC. If you have a VPC peering connection between Amazon EMR and Studio subnets, the respective route tables must route traffic to each other. If they don't route the traffic properly, then you get the ConnectTimeoutError.

2.    If you set up your private subnet in VPC only mode without a NAT gateway, then create the following AWS PrivateLink interface endpoints for EMR and AWS Security Token Service (AWS STS), respectively:

  • **com.amazonaws.

.elasticmapreduce**

  • **com.amazonaws.

.sts**

These endpoints must be created under the VPC that's used with the EMR cluster and SageMaker Studio.

AWS STS is a global service. Therefore, you might get the ConnectTimeoutError when you try to connect to a cross-account Amazon EMR cluster from Studio in a Region other than us-east-1:

ConnectTimeoutError: Connect timeout on endpoint URL: "https://sts.amazonaws.com/"

To resolve this error, set the environment variable AWS_STS_REGIONAL_ENDPOINTS to regional within the Jupyter notebook before running the connect command:

%env AWS_STS_REGIONAL_ENDPOINTS=regional
%load_ext sagemaker_studio_analytics_extension.magics
%sm_analytics emr connect --cluster-id example-cluster-id --auth-type None  --assumable-role-arn arn:aws:iam::example-cross-account:role/example-role-name

For more information on Regional endpoints, see Managing AWS STS in an AWS Region and AWS STS Regionalized endpoints.

3.    Check whether the connection works by opening your Studio notebook, selecting Sparkmagic kernel, and then running the following command in the cell:

For connections within the same account:

%local
!sm-sparkmagic connect --cluster-id <cluster-id>

For cross-account connections:

%local

# If needed, use STS Regional endpoint
%env AWS_STS_REGIONAL_ENDPOINTS=regional 

!sm-sparkmagic connect --cluster-id <cluster-id> --role-arn arn:aws:iam::<cross-account>:role/<role-name>

-or-

Run the following command from the notebook terminal on the EMR master node private IP:

curl <EMR-Master-Private-IP>:8998/sessions -v

Perform the following checks to make sure that the configurations are accurate:

SSH into EMR cluster to fetch the PID of the Livy service:

ps -ef | grep livy

Check the port that the Livy service is running on:

sudo netstat -anp | grep <PID>

Be sure that the service is running on the default port (8998) for Livy.


Related information

Create and manage Amazon EMR Clusters from SageMaker Studio to run interactive Spark and ML workloads – Part 1

Create and manage Amazon EMR Clusters from SageMaker Studio to run interactive Spark and ML workloads – Part 2

Build Amazon SageMaker notebooks backed by Spark in Amazon EMR

AWS OFFICIAL
AWS OFFICIALUpdated a year ago