By using AWS re:Post, you agree to the AWS re:Post Terms of Use

How do I troubleshoot connectivity and resource access issues with my Amazon EMR Serverless jobs?

4 minute read
0

I want to troubleshoot connectivity and resource access issues with my Amazon EMR Serverless jobs.

Short description

Amazon EMR Serverless jobs read data from a source, process the data, and write the results to a data store. To do this, workers that run the job must connect to other AWS services and data sources that are hosted on AWS or on-premises. If your application isn't correctly configured, then you might experience network errors such as the following:

"java.net.SocketTimeoutException: Connect timed out."

For more information on networking options for Amazon EMR Serverless jobs, see Configuring VPC access and Other considerations.

Resolution

To troubleshoot connectivity and resource access issues with your Amazon EMR Serverless jobs, complete the following steps:

To confirm whether a job failed because of a networking error, check the status details of the job or the driver logs. Check for the following error message:

"java.net.SocketTimeoutException: Connect timed out."

If the preceding error is shown in the status details of the job and you can't fetch the driver logs, then the job didn't start. The job didn't start because the workers couldn't fetch the script from Amazon Simple Storage Service (Amazon S3). As a result, the following error appears when you try to fetch the logs:

"Failed to open logs for <job_name> (<job_run_id>). ${apiError}"

To troubleshoot the preceding errors, take the following actions:

  • Make sure that the Amazon S3 bucket is accessible from the network that your application runs on.
  • Check that your Amazon VPC endpoint policy, AWS Identity and Access Management (IAM) policy, service policy, and bucket policy don't deny access to your jobs.
  • If your job runs in an Amazon VPC subnet with no outbound connectivity to the internet, then create an Amazon S3 gateway Amazon VPC endpoint. Make sure that you create the endpoint in the same subnets as your application.

If these errors are in the driver logs, then review your error code message to identify the AWS services or data sources that are inaccessible. Then, confirm whether the AWS service or data source is reachable with your current networking configuration. If it isn't reachable, then update your networking configuration.

For Amazon VPC applications, take the following actions:

  • Make sure that the attached security groups allow outbound traffic to the required resources. Or, you can allow outbound traffic on all ports.
  • Make sure that the host application's subnet network access control lists (network ACL) doesn't deny any traffic between the required resources.
  • Make sure that your resource's security group allows inbound traffic from your application's subnets or security group.
  • If your application requires internet connectivity, then make sure that you use private subnets with outbound internet connectivity from a public NAT gateway.
  • If your application doesn't have outbound internet connectivity, then use Amazon VPC endpoints for the AWS services your job needs access to.
  • To determine whether a destination resource is reachable from your application's subnet, use the Reachability Analyzer. Use a worker elastic network interface as the source and a resource elastic network interface or IP address as the destination.

Additional troubleshooting

If your previously running application is stuck in the starting state, then check your security groups. Make sure that an existing security group hasn't been deleted from your Amazon VPC application.

If your job uses one subnet, but you configured more than one subnet, then your application might be configured with pre-initialized capacity. As a result, the workers start in a single subnet and the application continues to use the subnet until it's stopped. All jobs that are submitted to the application use the subnet with the pre-initialized workers.

If you run out of IP addresses or Amazon EMR Serverless uses all available subnet IP addresses, then scale your application. Workers and IP addresses in your subnet have one to one mapping. Make sure that your subnet has enough available IP addresses to start your application. For more information, see Best practices for subnet planning.

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago