EMR Cluster log analysis for "Terminated with Internal errors"

2 minute read
Content level: Advanced
2

The guidance provided in the article could prove instrumental in conducting a comprehensive and systematic evaluation of the log data, potentially leading to the identification and resolution of the underlying causes contributing to"Terminated with Internal errors" and issues within the EMR cluster environment when provisioning it.

The below accompanying flowchart delineates a systematic approach to analyzing various log files, contingent upon whether the Amazon Elastic Compute Cloud (EC2) instances were successfully provisioned during the EMR cluster creation process or not. In the event that instances were not created during provisioning(mostly a validation error), multiple potential issues may arise, and a list of some of these possible issues is provided. Conversely, if instances were indeed created, the recommended course of action is to methodically examine the log files in the specified order to conduct a comprehensive investigation and identify the root cause of the observed issue.

For further understanding of the log analysis process for each log file, I would also encourage to refer to the informative articles referenced in the links provided under the "EMR jobflow logs" section.


Enter image description here


Some possible IssuesEMR JobFlow logsCloudTrail logs
Permission IssuesBelow articles to analyse each logSpecific API calls to check
service role not authorizedEC2 System logsCloudTrail for EMR
KMS policySetup-devices logs
EC2/VPC access deniedbootstrap-actions logs
Security Configurationprovision-node logs
KMS key missing/not accessibleapplication logs
Incomplete configurationinstance-state logs
In-transit cert failure
Custom AMI
Private DNS issue
AMI not suitable
Permission issues inside AMI

I hope this structured approach aims to facilitate a comprehensive and streamlined analysis of EMR cluster log files, enabling the identification and resolution of any issues or errors encountered during the cluster provisioning or operation phases. If the issue still persists or need more specific assistance, it may be recommend to engage AWS Support for further guidance.

AWS
SUPPORT ENGINEER
published 2 months ago658 views
1 Comment

Very informative guide for troubleshooting cluster provisioning issues. Indeed, appreciate your efforts.

Thanks

Mark
replied 2 months ago