emr primary can't rpc core on port 8020 for HUE HADOOP

0

Trying to use HUE as a web interface hosted on EMR server to issue HIVE QL. The file connection works fine -- can explore S3 files no problem (which probably doesn't require controlled core machines.) But any attempt to use HIVE QL to create tables (which probably does require controlled core machines for efficiency) results in remote procedure call error: "java.net.NoRouteToHostException No Route to Host from ip-xxx-xx-xx-xxx.us-west-1.compute.internal/172.31.29.217 to ip-yyy-yy-yy-yyy.us-west-1.compute.internal:8020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; But according to emr service port listing (https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-service-ports.html) 8020 is used by namenode for rpc and is started automatically -- users should not set up the port for access; violates security. So what can I do to fix this error?

asked a year ago386 views
2 Answers
1
Accepted Answer

Are you sure these IP's / FQDN's belong to same EMR cluster and they have Route to each other ?
ip-xxx-xx-xx-xxx.us-west-1.compute.internal/172.31.29.217 ip-yyy-yy-yy-yyy.us-west-1.compute.internal:8020

Since this is happening on CREATE table statement, I am suspecting that you might be creating a HIVE table on HIVE DB pointing to different cluster's HDFS. Were you using a common HIVE metastore for this cluster and previously terminated EMR cluster ? you can verify the LOCATION on your DB / Table by using DESCRIBE DATABASE EXTENDED your_database_name or going to Metastore DB and checking the relevant schema.

profile pictureAWS
answered a year ago
0

You are completely correct. The problem is that we took the default for AWS GLUE catalog storage -- which is in EMR hdfs which disappears when the cluster is terminated. In fact we were using a different cluster, but GLUE was trying to get to where its detailed catalog data used to be but no longer existed.
Besides the fact the ip address wasn't in our current cluster, the other clue was the error message mentioned that the the core was trying to find out if the data was encrypted -- metadata that only exists in catalog.

For others who may be new to GLUE the following info may be interesting.

When you create a Hive table without specifying a LOCATION, the table data is stored in the location specified by the hive.metastore.warehouse.dir property. By default, this is a location in HDFS. If another cluster needs to access the table, it fails unless it has adequate permissions to the cluster that created the table. Furthermore, because HDFS storage is transient, if the cluster terminates, the table data is lost, and the table must be recreated.

We can update the property "hive.metastore.warehouse.dir" using any one of the below ways.

  1. hive shell, set command like below set hive.metastore.warehouse.dir=s3://cr_test_bucket/prefix1/

  2. EMR console , please refer [2] for sample configurations \t[2] Configure applications - https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

  3. /etc/hive/conf/hive-site.xml file in the primary node, we need to restart hive services to reflect the update.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions