- Newest
- Most votes
- Most comments
Hello.
Is it possible to connect via serial console to the EC2 that is experiencing the issue?
If this is possible, you may be able to connect with a serial console and check the network settings of the OS.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/connect-to-serial-console.html
Also, is it possible to access metadata on EC2, which is the basis of the AMI?
I've commonly seen that error once or twice at the start of userdata logs while the OS is booting up, but it usually resolves once the network stack is fully running. IMDS should be accessible regardless of security group/NACL/Route Table settings, so those shouldn't be an issue.
- Is the AMI + UserData used inside and outside the ASG identical (or for simplicity, is the same launch template used to launch the non-ASG test instance)?
- Are there any errors seen outside the instance?
- Is there anything else running on startup which might be affecting the local network stack of the OS? IPTables being configured or something similar?
Hello Shahad, Thank you for your response.
- Is the AMI + UserData used inside and outside the ASG identical (or for simplicity, is the same launch template used to launch the non-ASG test instance)?
The template used to start the non-ASG test instance is identical to the ASG template.
- Are there any errors seen outside the instance?
If you are referring to the system log when you say the error is displayed outside the instance, then yes, that is correct.
- Is there anything else running on startup which might be affecting the local network stack of the OS? IPTables being configured or something similar?
I have not performed any separate work on the local network.
If you have a Premium Support plan, it will probably be simpler to troubleshoot if someone can see the actual instances being launched.
There shouldn't be anything special about instances launched inside an ASG. AutoScaling is just calling RunInstances or CreateFleet to EC2, the same as you would from the EC2 console.
From that very last line in your screenshot, can you check on one of the non-working ASG instances if IMDS has been disabled? https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_InstanceMetadataOptionsRequest.html If its been disabled, you should be able to see if from the results of this command: https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-instances.html - maybe its getting disabled post launch by some sort of automation?
After accessing the problematic Amazon Linux 2 instance and checking the logs, I found an error stating 'NET: dhclient: Locked /run/dhclient/resolv.lock'. Are you aware of what this might be
That looks like a message stating the DHCP config file is being locked so something can edit it. It shouldn't necessarily be a problem. Is there a lifecycle hook, or anything else on the ASG which might be triggering additional deployments to these instance not happening on the standalone instance? Maybe there's something like CodeDeploy trying to deploy additional software/updates at the same time as the userdata is running, which is causing conflicts/race conditions?
According to what Shahad_C mentioned, our Auto Scaling has lifecycle hooks set with a start time of 600s and a deletion time of 30s. We do not use CI/CD methods for deployment (e.g., CodePipeline, Jenkins, etc.). However, we also do not experience conflicts during deployment.
Relevant content
- asked 2 years ago
- asked 2 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated a year ago
Hello. RIku Kobayashi I have also attempted to use the EC2 serial console, but access to the serial console was not possible either.
How about detaching one EC2 with a problem from AutoScaling, attaching the root volume to another EC2, mounting it, and checking if it can be started? https://docs.aws.amazon.com/autoscaling/ec2/userguide/detach-instance-asg.html
You can detach the root volume by following the steps in the document below. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-detaching-volume.html https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-attaching-volume.html
Unfortunately, detaching the root volume is pointless as the problem does not occur continuously but arises only occasionally.