How do I use Amazon Athena to troubleshoot when my Amazon EMR Spark jobs fail?

3 分的閱讀內容
0

My Spark job on Amazon EMR has failed. I want to query the Spark logs with Amazon Athena to troubleshoot the failure.

Resolution

When Amazon EMR applications run on Amazon EMR, they produce log files. You can create a basic table to store information from the Amazon EMR log files and then use Athena to query these Amazon EMR logs. You can query the Amazon EMR logs table to identify events and trends for applications and clusters.

To create a table from the Amazon EMR log files in your Amazon Simple Storage Service (Amazon S3) log location, run the following query:

CREATE EXTERNAL TABLE `myemrlogs`(
  `data` string COMMENT 'from deserializer')
ROW FORMAT DELIMITED  
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://aws-logs-111122223333-us-west-2/elasticmapreduce/j-1ABCDEEXAMPLE/containers/application_1111222233334_5555/'

Replace the following in the query:

  • myemrlogs with the name of your table
  • 111122223333 with your AWS account number
  • j-1ABCDEEXAMPLE with the clusterID
  • us-west-2 with your preferred Region
  • application_1111222233334_5555 with the application ID

Note: The S3 bucket mentioned in the example is the default bucket used by Amazon EMR. To verify your log bucket path, open the Amazon EMR console, choose your cluster, and then check the Log URI field in the Summary tab.

To check for occurrences of FAIL, ERROR, WARN, EXCEPTION, FATAL, or CLAUSE in your logs table, run a query similar to the following:

SELECT *,"$PATH" FROM myemrlogs WHERE regexp_like(data, 'FAIL|ERROR|WARN|EXCEPTION|FATAL|CAUSE') limit 100;

Note: Replace myemrlogs with the name of the table that you created from your Amazon EMR log files.

You can query the Amazon EMR logs in different ways to find out when the Spark application failed. Use the following example queries to troubleshoot whether the application failed at the job, stage, task, or executor level.

To get the exit code of the application, run a query similar to the following:

SELECT *,"$PATH" FROM myemrlogs WHERE regexp_like(data, 'exitCode');

To check which host the Spark executor is running on, run a query similar to the following:

SELECT *,"$PATH" FROM myemrlogs WHERE regexp_like(data, 'executor ID');

To track how tasks are mapped to stages, run a query similar to the following:

SELECT *,"$PATH" FROM myemrlogs WHERE regexp_like(data, 'TID');

To check the heap memory details of containers, run a query similar to the following:

SELECT *,"$PATH" FROM myemrlogs WHERE regexp_like(data, 'space');

To track the progress of each job or stage on the Directed Acyclic Graph (DAG) scheduler, run a query similar to the following:

SELECT *,"$PATH" FROM myemrlogs WHERE regexp_like(data, 'DAGScheduler');

You can also create a partitioned table based on Amazon EMR logs and then use Athena to query these logs. For more information, see Create and query a partitioned table based on Amazon EMR logs.

Related information

How can I troubleshoot stage failures in Spark jobs on Amazon EMR?

AWS 官方
AWS 官方已更新 2 個月前