Tableau Integration with Kerberos EMR Cluster

7 minute read
Content level: Advanced
3

This article describes the high level procedure on how to integrate the tableau application with kerberized EMR cluster.

This tutorial provides an overview and the steps described on how to integrate the tableau application with hive kerberized EMR cluster using self signed certificate. When you are configuring for the real time environment, please make sure to test and alter the configurations according to the requirement and scenario. Here are the environmental details and overall summary of the steps that followed.

Environment details and security configuration:

EMR version - 6.9.0
Applications -  Hive 3.1.3, Hue 4.10.0
Network - private subnet
hive.metastore.client.factory.class - com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory
In-transit - Self signed certificate(testing purpose)
Kerberos - Cluster dedicated KDC

High-level steps:

1. Create EMR Cluster with cluster dedicated KDC & in-transit encryption technique.
2. Launch a windows ec2 machine in same VPC as EMR Cluster created. Configure Tableau version.
3. Copy the krb5.conf file from the EMR cluster to the client machine. This can be found in any node in the EMR cluster.
4. Install Kerberos ticket manager in the tableau machine and configure the Kerberos authentication.
5. Obtain the Kerberos ticket with the ticket manager tool.
6. Grant the inbound traffic from the windows machine to EMR master security group.
7. Connect the hive thrift server from tableau application.

1. Create EMR Cluster with cluster dedicated KDC & in-transit encryption technique.

Cluster creation command:

aws emr create-cluster \—name "kerberos_hbase" \—log-uri "s3n://<s3bucket> /logs/" \—release-label "emr-6.9.0" \—service-role "arn:aws:iam::xxxxxxxxxxx:role/EMR_DefaultRole" \—security-configuration "kerberos_hbase" \—kerberos-attributes '{"Realm":"EC2.INTERNAL","KdcAdminPassword":"","ADDomainJoinUser":""}' \—ec2-attributes '{"InstanceProfile":"EMR_EC2_DefaultRole","EmrManagedMasterSecurityGroup":"sg-0ef9143547cdxxxxx","EmrManagedSlaveSecurityGroup":"sg-08a1747a3d2xxxxxx","KeyName":"testemr","AdditionalMasterSecurityGroups":[],"AdditionalSlaveSecurityGroups":[],"SubnetId":"subnet-0e8f8b627843xxxxx"}' \—applications Name=HBase Name=Hive Name=Hue Name=Phoenix \—configurations '[{"Classification":"hive-site","Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}},{"Classification":"hbase","Properties":{"hbase.emr.storageMode":"s3"}},{"Classification":"hbase-site","Properties":{"hbase.rootdir":"s3://<s3bucket> /kerberos_hbase"}}]' \—instance-groups '[{"InstanceCount":1,"InstanceGroupType":"CORE","Name":"Core","InstanceType":"m5.xlarge","EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32},"VolumesPerInstance":2}]}},{"InstanceCount":1,"InstanceGroupType":"MASTER","Name":"Primary","InstanceType":"m5.xlarge","EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32},"VolumesPerInstance":2}]}}]' \—scale-down-behavior "TERMINATE_AT_TASK_COMPLETION" \—ebs-root-volume-size "25" \—os-release-label "2.0.20230320.0" \—region "us-east-1"

1.1. Log in to the master node using the root user account and create an application user using kadmin.local. Also, create the home directory in HDFS for the created user.

[root@ip-172-31-91-212 ~]# useradd tableau
[root@ip-172-31-91-212 ~]# kadmin.local 
Authenticating as principal root/admin@EC2.INTERNAL with password.
kadmin.local:  addprinc tableau
WARNING: no policy specified for tableau@EC2.INTERNAL; defaulting to no policy
Enter password for principal "tableau@EC2.INTERNAL": 
Re-enter password for principal "tableau@EC2.INTERNAL": 
Principal "tableau@EC2.INTERNAL" created.
kadmin.local: 
[hadoop@ip-172-31-91-212 ~]$ hdfs dfs -mkdir /user/tableau
[hadoop@ip-172-31-91-212 ~]$ hdfs dfs -chown tableau:tableau /user/tableau  

1.2. Log in to the user and initiate the token

[tableau@ip-172-31-91-212 ~]$ id
uid=1001(tableau) gid=1002(tableau) groups=1002(tableau)
[tableau@ip-172-31-91-212 ~]$ kinit
Password for tableau@EC2.INTERNAL: 
[tableau@ip-172-31-91-212 ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_1001
Default principal: tableau@EC2.INTERNAL
Valid starting       Expires              Service principal

1.3. Create the keystore file, export the keystore.jks and import into client’s truststore and then verify keystore/truststore file to make sure it is created as mentioned below. Please note that below references are meant to self signed certificate and real time configs involves additional steps depends on the requirement and scenario.

[hadoop@ip-172-31-91-212 ~]$ sudo keytool -genkey -alias hive -keyalg RSA -keystore keystore.jks -storepass changeit -keypass changeit -dname \
"CN=*.ec2.internal,OU=Premium Support,OU=AWS,L=Virginia,S=Herndon,C=US"
keytool error: java.lang.Exception: Key pair not generated, alias <hive> already exists
[hadoop@ip-172-31-91-212 ~]$ 
[hadoop@ip-172-31-91-212 ~]$ sudo keytool -list -keystore keystore.jks
Enter keystore password:  
Keystore type: jks
Keystore provider: SUN

Your keystore contains 1 entry

hive, 18-Apr-2023, PrivateKeyEntry, 
Certificate fingerprint (SHA-256): 8E:97:49:06:10:09:0A:08:51:FB:98:B5:CB:16:50:4A:FA:CE:A2:0A:9D:CD:06:41:51:B7:E7:51:E5:1D:37:00
Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using 
"keytool -importkeystore -srckeystore keystore.jks -destkeystore keystore.jks -deststoretype pkcs12".
[hadoop@ip-172-31-91-212 ~]$ 
#export the keystore file
[hadoop@ip-172-31-91-212 ~]$ sudo keytool -export -alias hive -file hive.crt -keystore keystore.jks
Enter keystore password:  
Certificate stored in file <hive.crt>
#Import into client's truststore
[hadoop@ip-172-31-91-212 ~]$ sudo keytool -import -trustcacerts -alias hive -file hive.crt -keystore truststore.jks
Enter keystore password:  
Re-enter new password: 
Owner: CN=*.ec2.internal, OU=Premium Support, OU=AWS, L=Virginia, ST=Herndon, C=US
Issuer: CN=*.ec2.internal, OU=Premium Support, OU=AWS, L=Virginia, ST=Herndon, C=US
Serial number: 15b0b393
Valid from: Tue Apr 18 21:43:33 UTC 2023 until: Mon Jul 17 21:43:33 UTC 2023
Certificate fingerprints:
   SHA1: 27:99:1A:9D:70:1A:0C:7B:44:BD:51:6D:B1:76:5C:93:38:9A:FC:27
   SHA256: 8E:97:49:06:10:09:0A:08:51:FB:98:B5:CB:16:50:4A:FA:CE:A2:0A:9D:CD:06:41:51:B7:E7:51:E5:1D:37:00
Signature algorithm name: SHA256withRSA
Subject Public Key Algorithm: 2048-bit RSA key
Version: 3

Extensions: 
#1: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: 35 37 DD F4 74 A5 75 16   9F 1F 21 0D 10 2C 86 27  57..t.u...!..,.'
0010: DF 4B D7 64                                        .K.d
]
]

Trust this certificate? [no]:  yes
Certificate was added to keystore
[hadoop@ip-172-31-91-212 ~]$ 
#Verify the imported truststore file
[hadoop@ip-172-31-91-212 ~]$ sudo keytool -list -keystore truststore.jks
Enter keystore password:  
Keystore type: jks
Keystore provider: SUN

Your keystore contains 1 entry

hive, 18-Apr-2023, trustedCertEntry, 
Certificate fingerprint (SHA-256): 8E:97:49:06:10:09:0A:08:51:FB:98:B5:CB:16:50:4A:FA:CE:A2:0A:9D:CD:06:41:51:B7:E7:51:E5:1D:37:00
[hadoop@ip-172-31-91-212 ~]$ 

Add the below parameters in hive-site.xml and core-site.xml files respectively and then create keystore password for hive2.

#hive-site

<property> <name>hive.server2.use.SSL</name> <value>true</value> </property>  
<property> <name>hive.server2.keystore.path</name> <value>/home/hadoop/keystore.jks</value> </property>  
<property> <name>hive.server2.keystore.password</name> <value>xxxxxxxx</value> </property>

#core-site  
<property> <name>hadoop.security.credential.provider.path</name> <value>jceks://file/home/hadoop/hive.jceks</value> </property>

Create the hive server 2 keystore password using the above keystore file. For an example,

[hadoop@ip-172-31-91-212 ~]$ sudo hadoop credential create hive.server2.keystore.password -provider jceks://file/home/hadoop/hive.jceks
WARNING: You have accepted the use of the default provider password
by not configuring a password in one of the two following locations:
    * In the environment variable HADOOP_CREDSTORE_PASSWORD
    * In a file referred to by the configuration entry
      hadoop.security.credstore.java-keystore-provider.password-file.
Please review the documentation regarding provider passwords in
the keystore passwords section of the Credential Provider API
Continuing with the default provider password.

Enter alias password: 
Enter alias password again: 
hive.server2.keystore.password has been successfully created.
Provider jceks://file/home/hadoop/hive.jceks was updated.

Restart hiveserver2 daemon and test the connection using beeline. In case, any issues in connection, permission denied issue, check the hiveserver2 log and make the necessary changes.

[hadoop@ip-172-31-91-212 ~]$ nc -zv ip-172-31-91-212.ec2.internal 10000
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection refused.

sudo chmod 775 /home/hadoop/.hive.jceks.crc
sudo chmod 775 /home/hadoop/hive.jceks

1.4. Test the connection using beeline

[tableau@ip-172-31-91-212 ~]$ beeline --verbose=true -u "jdbc:hive2://ip-172-31-91-212.ec2.internal:10000/default;ssl=true;
sslTrustStore=/home/hadoop/truststore.jks;trustStorePassword=changeit;principal=hive/ip-172-31-91-212.ec2.internal@EC2.INTERNAL" 
!connect jdbc:hive2://ip-172-31-91-212.ec2.internal:10000/default;ssl=true;sslTrustStore=/home/hadoop/truststore.jks;
trustStorePassword=changeit;principal=hive/ip-172-31-91-212.ec2.internal@EC2.INTERNAL '' [passwd stripped] 
Connecting to jdbc:hive2://ip-172-31-91-212.ec2.internal:10000/default;ssl=true;sslTrustStore=/home/hadoop/truststore.jks;
trustStorePassword=changeit;principal=hive/ip-172-31-91-212.ec2.internal@EC2.INTERNAL
Connected to: Apache Hive (version 3.1.3-amzn-2)
Driver: Hive JDBC (version 3.1.3-amzn-2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3-amzn-2 by Apache Hive
0: jdbc:hive2://ip-172-31-91-212.ec2.internal> 

2. EC2 windows machine in the same VPC and subnet where the EMR cluster has created already and install Tableau application.

Download, install and configure the tableau application. Refer to this external link to perform this step.

3. Copy the krb5.conf file from the emr cluster to the client machine. This can be found in any node in the emr cluster.

3.1. Take a krb5.conf configuration file from the EMR master node.  You can obtain this file from the /etc/krb5.conf.

3.2. Rename the configuration file from krb5.conf to krb5.ini.

3.3. Copy the krb5.ini file to the C:\ProgramData\MIT\Kerberos5 directory and overwrite the empty sample file in the tableau windows machine.

4. Install kerberos ticket manager in the client machine.

4.1. Download and install the MIT ticket manager tool from this link - https://web.mit.edu/kerberos/dist/#krb5-1.20

4.2. Create a directory where you want to save the Kerberos credential cache file. For example, create a directory named C:\temp. 

4.3. Add a new system variable in Windows Environment variable.

Variable Name - KRB5CCNAME Variable Value - C:\temp\krb5cache

5. Obtained the kerberos ticket with the ticket manager tool.

Open MIT ticket manager and click on get ticket and then provide the principal credential. Alternatively, you can also get the ticket from KDC using kinit -k -t command as well.

ticket manager

6. Granted the inbound traffic from the windows machine to EMR master security group.

Allow the inbound traffic from windows machine to EMR KDC and hiveserver2, as this might block the connection if the traffic rules are imposed.

As per the local KDC settings, port 88, 749 should be allowed. Also 10000, 10002/10001 ports require for hiveserver2 connection.

7. Connect the hive thrift server from tableau application.

After the above settings are established, open tableau connection tab, and provide the necessary connection strings to create a connection to hiveserver2. Tableu sign up

database load

After established server connection using the above connection string, we can connect to the database and scheme to interact with hive query transactions.

AWS
SUPPORT ENGINEER
published a month ago1045 views