I'm having trouble with Apache Hive queries in Amazon EMR. I want to collect logs so that I can troubleshoot these issues.
Amazon EMR supports the following methods for working with Hive. Troubleshooting steps differ depending on which method you use:
Hive logs are stored in the following directories on the cluster's master node. For more information, see View log files on the master node.
Based on where you submitted your Hive query, your query logs are logged in different locations under /mnt/var/hive/ of Amazon EMR master node. Logs in this location are also pushed to the Amazon S3 LogUri that you configured when you created the Amazon EMR cluster.
For example, if you run queries from the Hive shell as hadoop (the default user), query errors are logged in the following directory:
[hadoop@ip-172-xx-xx-x ~]$ cd /mnt/var/log/hive/user/hadoop
[hadoop@ip-172-xx-xx-x hadoop]$ tail -20 hive.log
Hue, JDBC, or ODBC
HiveServer2 allows clients, such as Beeline, JDBC, ODBC (via SQL Workbench/J, for example) to run queries against Hive.
For more information on clients supported by HiveServer2, see HiveServer2 clients in the Confluence website.
Check for errors in the hive-server2 logs under the following conditions:
- You need to troubleshoot a failed query submitted by one of these clients.
- You have trouble connecting to Hive from clients using JDBC or ODBC drivers.
[hadoop@ip-172-xx-xx-x ~]$ cd /mnt/var/log/hive/
[hadoop@ip-172-xx-xx-xxx hive]$ ls -ltr
-rw-r--r-- 1 hive hive 42 May 25 19:29 hive-server2.out
drwxrwxrwt 4 root root 30 May 25 19:29 user
-rw-r--r-- 1 hive hive 49075 May 25 19:29 hive-server2.log
[hadoop@ip-172-31-33-9 hive]$ tail -20 hive-server2.log
Note that by default, all Hive queries on Amazon EMR use the TEZ engine. The query might trigger a YARN application. To troubleshoot the failure of a YARN application, see the YARN container logs. For more information, see the YARN application history section in this article.
Amazon EMR steps
Check the step logs, which are located in /var/log/hadoop/steps/. For example:
[hadoop@ip-172-xx-xx-x s-3C4CZ9G05FEAX]$cd /var/log/hadoop/steps/s-3C4CZ9G05FEAX
[hadoop@ip-172-xx-xx-x s-3C4CZ9G05FEAX]$ ls -ltr
-rw-rw-r-- 1 hadoop hadoop 0 May 25 21:09 syslog
-rw-rw-r-- 1 hadoop hadoop 1304 May 25 21:09 stdout
-rw-rw-r-- 1 hadoop hadoop 213 May 25 21:09 stderr
-rw-rw-r-- 1 hadoop hadoop 2589 May 25 21:09 controller
YARN application history
The easiest way to view and monitor YARN application details is to first open the Amazon EMR console. Then, check the Application history tab of the cluster's detail page. For more information, see View application history.
To see if errors occurred in a Tez or MapReduce application that runs in the background when you run a Hive query, check the YARN application logs on Amazon Simple Storage Service (Amazon S3). For more information, see View log files archived to Amazon S3. For example:
$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/
$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/node/i-045d100a1fcd13ef2/
$ aws s3 ls s3://aws-logs-223377617334-us-west-2/elasticmapreduce/j-3MCDUQO2MWNJ5/containers/application_123456789_0001/container_1527279117205_0001_01_000001/
2020-10-25 15:46:04 842 stdout.gz
2020-10-25 15:46:04 4089 syslog.gz
Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, make sure that you’re using the most recent AWS CLI version.
How do I resolve "OutOfMemoryError" Hive Java heap space exceptions on Amazon EMR that occur when Hive outputs the query results?
Hive cluster errors