Skip to content

How do I access the Spark user interface in Amazon EMR?

3 minute read
0

I want to view Apache Spark web user interfaces that my Amazon EMR clusters host.

Resolution

The Spark History Server is a web user interface where you can view the status of running and completed Spark jobs on your Amazon EMR cluster.

To access the Spark user interface hosted in a public or private subnet, either use persistent application user interfaces or on-cluster application user interfaces.

Persistent application user interfaces

In your Amazon EMR cluster, the apppusher daemon periodically sends Spark event logs to Amazon EMR production buckets. The persistent Spark user interface uses the event logs to show Spark applications.

This feature works when the event log directory for the application is in a Hadoop Distributed File System (HDFS). By default, Amazon EMR stores event logs in the /var/log/spark/apps directory of HDFS. If you change the default directory to a different file system, such as Amazon Simple Storage Service (Amazon S3), then this feature doesn't work. For more information, see Considerations and limitations.

You can access the application history and relevant log files for active and terminated clusters. The logs are available for 30 days after the application ends. For more information, see View persistent application user interfaces in Amazon EMR.

On-cluster application user interfaces

The primary node hosts the on-cluster user interfaces, and you need an SSH connection to access the web server.

To access the on-cluster user interface, complete the following steps:

  1. Use SSH to connect to the primary node.
  2. Configure SSH tunneling with dynamic port forwarding.
  3. Configure your internet browser to use an add-on such as FoxyProxy for Firefox or SwitchyOmega for Chrome to manage your SOCKS proxy settings.
    Note: This method automatically filters URLs based on text patterns. Also, this method limits the proxy settings to domains that match the form of the primary node's DNS name.

An on-cluster user interface in a private subnet isn't directly accessible unless you use a local network through a VPN connection or AWS Direct Connect. And, you must configure the route so that communication spans the AWS and local networks.

Or, you can use a bastion or jump server hosted in a public subnet to connect to a private subnet. Then, create SSH tunneling with dynamic port forwarding. For more information, see Securely access web interfaces on Amazon EMR launched in a private subnet.

AWS OFFICIALUpdated 8 months ago