How can I access an Amazon EMR cluster through an application if the cluster is in a private subnet?

3 minute read
0

I want to use an application, such as Apache Livy, to access and submit work to an Amazon EMR cluster that's in a private subnet.

Short description

Create an Application Load Balancer with the internet-facing scheme. Set the target of the Application Load Balancer to the private IP address of the master node. Then, you can do the following:

  1. Connect to the EMR cluster that's in a private subnet.
  2. Submit jobs to the client using REST APIs.

Resolution

Note: It's not a best practice to use this procedure if you launch the cluster with Kerberos, or if you enable SSL for Livy. 

  1. Open the Amazon Elastic Compute Cloud (Amazon EC2) console.
  2. In the navigation pane, under Load balancing, choose Load Balancers.
  3. Choose Create Load Balancer.
  4. On the Select load balancer type page, under Application Load Balancer, choose Create.
  5. On the Step 1: Configure Load Balancer page, do the following:
    For Scheme, choose internet-facing.
    For Listeners, use the default options (HTTP and port 80)
    For VPC, choose the VPC that the EMR cluster is in.
    For Availability Zones, choose two subnets. Be sure that one of them is the subnet that the EMR cluster is in (private subnet).
  6. Choose Next: Configure Security Settings.
  7. If you created a secure listener in the previous step, then complete the Configure Security Settings page. Otherwise, choose Next: Configure Security Groups.
  8. Select the security group or groups for the Application Load Balancer. Remember, this is an internet-facing Application Load Balancer. It's a best practice to use a security group that limits incoming requests from a specific IP address or IP address range.
  9. Choose Next: Configure Routing.
  10. On the Step 4: Configure Routing page:
    For Target group, select the target group of your choice.
    For Name, enter the target name.
    For Target type, choose IP.
    For Protocol, choose HTTP.
    For Port, enter the port of the client's web UI. For example, for Livy, enter 8998. For more information, see View web interfaces hosted on Amazon EMR clusters.
    In the Health checks section, for Protocol, choose HTTP.
    For Path, enter /sessions.
  11. Choose Next: Register Targets.
  12. On the Step 5: Register Targets page, for IP, enter the private IP address of the master node. You can find the private IP address of the master node on the Hardware tab of the cluster details page.
  13. Choose Add to list to add the IP address to the To be registered list.
  14. Choose Next: Review, and then choose Create.
  15. When the State changes to active, choose the Listeners tab.
  16. In the Rules column, choose the target group link.
  17. On the Targets tab, confirm that the Registered targets and Availability Zones are healthy.
  18. On the Description tab, choose the link next to Load balancer (the link that's the name of the load balancer). If the Livy web UI appears, then the configuration is working. This means that the requests can reach Livy on the EMR cluster in the private subnet.

You can now submit jobs to the client. For example, the following command submits an Apache Spark job to the Livy server. Replace livyALB-2103017743.us-east-1.elb.amazonaws.com with the DNS name of your Application Load Balancer. You can find the DNS name on the Description tab for the Application Load Balancer.

curl -X POST --data '{"kind": "pyspark"}' -H "Content-Type: application/json" livyALB-2103017743.us-east-1.elb.amazonaws.com/sessions

Related information

Accessing the Spark web UIs

Apache Livy

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago