metadata service is unstable: connection timeout, Failed to connect to service endpoint etc

0

start from recently, our long running job are hitting metadata issue frequently. The exceptions various, but the all point to EC2 metadata service. It's either failed to connection the endpoint, or timeout to connect to the service, or complaining that I need to specify the region while building the client. The job is running on EMR 6.0.0 in Tokyo, with correct Role set, and the job has been running fine for months, just started from recent, it became unstable.

So my question is: how can we monitor the healthy the metadata service? request QPS, success rate, etc.

A few callstacks

com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [com.amazon.ws.emr.hadoop.fs.guice.UserGroupMappingAWSSessionCredentialsProvider@4a27ee0d: null, com.amazon.ws.emr.hadoop.fs.HadoopConfigurationAWSCredentialsProvider@76659c17: null, com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.auth.InstanceProfileCredentialsProvider@5c05c23d: Failed to connect to service endpoint: ]
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:136)
com.amazonaws.SdkClientException: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region.
	at com.amazonaws.client.builder.AwsClientBuilder.setRegion(AwsClientBuilder.java:462)
	at com.amazonaws.client.builder.AwsClientBuilder.configureMutableProperties(AwsClientBuilder.java:424)
	at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46)
com.amazonaws.SdkClientException: Unable to execute HTTP request: mybucket.s3.ap-northeast-1.amazonaws.com
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1189) ~[aws-java-sdk-bundle-1.11.711.jar:?]

Caused by: java.net.UnknownHostException: mybucket.s3.ap-northeast-1.amazonaws.com
    at java.net.InetAddress.getAllByName0(InetAddress.java:1281) ~[?:1.8.0_242]
    at java.net.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_242]
    at java.net.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_242]
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Failed to connect to service endpoint: 

Caused by: java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
Hubery
질문됨 2년 전3278회 조회
1개 답변
0

As of today, there is no specific metric available to monitor EC2 IMDS metrics like request QPS, success rate etc.

Kindly review the application logs pertaining to each exceptions, as the outlined exceptions could also occur due to various other issues in the environment like network (Failed to connect to service endpoint), DNS ( Unable to execute HTTP request) etc for example.

AWS
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠