By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Broken connection with EC2 node during execution

0

We are running test automation using Jenkins EC2 plugin.

Region: eu-west-1

During tests execution on AWS EC2 nodes, there might be connection issue (3 out of 10 runs).

It could happen any time during the execution and as we can see do not related to the code that executing.

We are using three executors(jobs) per one EC2 node.

In the logs we see next error:

Cannot contact EC2 (aqa-jenkins-ec2-selenoid) - aqa-selenoid-no-hosts-1.3 (i-0f57b3144a3279da4): java.lang.InterruptedException
Could not connect to EC2 (aqa-jenkins-ec2-selenoid) - aqa-selenoid-no-hosts-1.3 (i-0f57b3144a3279da4) to send interrupt signal to process

The question is next: what is the root cause of this issue and how it can be fixed?

asked 2 months ago116 views
2 Answers
0

The issue you're experiencing with broken connections to your EC2 node during test execution could have several potential root causes. Based on the error messages you've provided, it appears that there's an intermittent connectivity problem between your Jenkins server and the EC2 instance running your tests.

Here are some possible root causes and solutions:

  1. Network instability: Intermittent network issues between your Jenkins server and the EC2 instance could cause connection drops. This might be due to AWS network congestion, routing problems, or issues with your VPC configuration.

Solution: Review your network setup, ensure your security groups and network ACLs are correctly configured, and consider using AWS Direct Connect or a VPN for more stable connectivity if you're connecting from an on-premises network.

  1. Instance resource constraints: If the EC2 instance is running out of resources (CPU, memory, or disk), it might become unresponsive, leading to connection issues.

Solution: Monitor your instance's resource utilization and consider upgrading to a larger instance type if necessary. Also, ensure that your tests aren't consuming excessive resources.

  1. EC2 Instance Connect issues: If you're using EC2 Instance Connect for SSH access, there might be problems with the service or its configuration.

Solution: Verify that EC2 Instance Connect is properly set up and that your IAM permissions are correct. You can use the AWSSupport-TroubleshootEC2InstanceConnect runbook to diagnose connection issues related to EC2 Instance Connect.

  1. Security group or firewall restrictions: Overly restrictive security group rules or instance-level firewall settings could be intermittently blocking connections.

Solution: Review and adjust your security group rules to ensure they allow necessary inbound and outbound traffic. Also, check the instance's firewall settings if applicable.

  1. Instance state issues: The EC2 instance might be entering an unhealthy state or restarting during test execution.

Solution: Check the instance's status checks and system logs to ensure it's stable. Consider using Amazon CloudWatch to monitor instance health and set up alarms for early detection of issues.

To further diagnose and fix the issue:

  1. Implement more detailed logging on both the Jenkins server and the EC2 instance to capture more information about the connection failures.

  2. Use AWS Systems Manager to run diagnostics on the EC2 instance during test execution to identify any system-level issues.

  3. Consider implementing a retry mechanism in your test framework to handle temporary connection issues.

  4. If the problem persists, you may want to engage AWS support for a more in-depth investigation of the network and instance behavior.

By systematically addressing these potential causes, you should be able to identify and resolve the root cause of your connection issues.
Sources
Troubleshoot issues connecting to your Amazon EC2 Linux instance - Amazon Elastic Compute Cloud
Resolve EC2 Instance Connect issues in EC2 Linux | AWS re:Post
Troubleshoot website connectivity on EC2 Windows instance | AWS re:Post

profile picture
answered 2 months ago
profile picture
EXPERT
reviewed 2 months ago
0

The connection issues you're encountering during test execution on your AWS EC2 nodes using Jenkins EC2 plugin can stem from several potential root causes. Some likely reasons could be the following:

Potential Causes:

  1. Network Latency or Fluctuations:

    • Temporary network issues between Jenkins and the EC2 instances can cause interruptions. This is especially common in distributed systems and could be exacerbated by high load.
  2. EC2 Instance Resource Limitations:

    • If your EC2 instances are under heavy load (CPU, memory, I/O), they might not be responsive enough to communicate with Jenkins reliably.
  3. Jenkins Configuration:

    • Misconfigured settings in Jenkins or the EC2 plugin could lead to connection issues. For instance, timeout settings might be too low for your execution environment.
  4. Insufficient Executors:

    • Running three executors per EC2 node can lead to resource contention. If the tests are resource-intensive, this could overwhelm the node, leading to failures in communication.
  5. Timeout Settings:

    • If the timeouts for EC2 communication are set too low, transient issues can lead to frequent failures.

Recommendations to Fix the Issue:

  1. Increase EC2 Instance Size:

    • Consider upgrading your EC2 instances to larger types with more CPU and memory to handle the load better.
  2. Reduce the Number of Executors:

    • Start by reducing the number of executors per node to see if that alleviates the issue. You might experiment with two or even one executor per node.
  3. Adjust Timeout Settings:

    • Look into the Jenkins EC2 plugin configuration and adjust any relevant timeout settings (e.g., connection and read timeouts).
  4. Implement Retries:

    • If possible, implement retry logic in your Jenkins jobs to handle intermittent failures gracefully.
  5. Monitor and Optimize Resource Usage:

    • Use monitoring tools to observe the performance of your EC2 instances during test runs. This can help identify bottlenecks or resource saturation.
  6. Network Configuration:

    • Ensure that your VPC and security groups are correctly configured to allow communication between Jenkins and the EC2 nodes without intermittent issues.
  7. Logs Analysis:

    • Dive deeper into the Jenkins logs and AWS CloudWatch logs for your EC2 instances to see if there are any specific patterns or additional error messages that can give you more context.
  8. Test Isolation:

    • If certain tests are particularly resource-heavy, consider isolating them to run on dedicated nodes to reduce the load on shared resources.

Some useful documentation

  1. Troubleshoot issues connecting to your Amazon EC2 Linux instance
  2. Troubleshoot issues with Amazon EC2 Windows instances
  3. Jenkins EC2 Plugin
AWS
answered 2 months ago
profile picture
EXPERT
reviewed 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions