Docker installation makes EC2 instance unresponsive

0

Steps to reproduce:

  1. run EC2 instance using
  • region - Frankfurt
  • ami-05c26ae4789875080 (Canonical, Ubuntu, 20.04 LTS, amd64 focal), ami-0ac05733838eabc06 (Canonical, Ubuntu, 18.04 LTS, amd64 bionic), or ami-04cf43aca3e6f3de3 (CentOS Linux 7 x86_64 HVM EBS ENA 1901_01)
  • AZ - eu-central-1a
  • type - m5, c5 (but seems like it doesn't matter)
  1. follow official docker engine installation guide https://docs.docker.com/engine/install/ubuntu/
  2. installation for ubutnu stucks after step
Created symlink /etc/systemd/system/sockets.target.wants/docker.socket → /lib/systemd/system/docker.socket
  1. EC2 instance becomes unresponsive, ssh drops with timeout. On centos - installation completes, but after starting docker behavior is the same.
    Reboot doesn't help.
    Similar with installation via snap (on ubuntu).

If you change AZ to eu-central-1b, issue is not reproduced, installation is successful.

Anyone knows what could be a root cause?

Edited by: bberenice on May 15, 2020 12:10 AM

Edited by: bberenice on May 16, 2020 4:35 AM

gefragt vor 4 Jahren1000 Aufrufe
2 Antworten
0

Dear bberenice,

Just for test, I've run same type of instance/AMI on the same availability zone.

 aws ec2 run-instances \
    --image-id ami-05c26ae4789875080 \
    --count 1 \
    --instance-type m5a.large \
    --key-name awshakantestkey01  \
    --subnet-id subnet-3a266c51 \
    --security-group-ids sg-0cb88d91d7b4ff0d4 \
    --placement Availabilityzone=eu-central-1a

Instance running...

ssh -i "awshakantestkey01.pem" ubuntu@ec2-xx-xxx-xxx-xx.eu-central-1.compute.amazonaws.com

    Welcome to Ubuntu 20.04 LTS (GNU/Linux 5.4.0-1009-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Sat May 16 16:42:49 UTC 2020

  System load:  0.08              Processes:             121
  Usage of /:   16.2% of 7.69GB   Users logged in:       0
  Memory usage: 2%                IPv4 address for ens5: 172.31.100.35
  Swap usage:   0%

0 updates can be installed immediately.
0 of these updates are security updates.


The list of available updates is more than a week old.
To check for new updates run: sudo apt update


The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

Docker Installed

ubuntu@ip-172-31-100-35:~$ sudo snap install docker
docker 18.09.9 from Canonical✓ installed


buntu@ip-172-31-100-35:~$ sudo docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 18.09.9
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: N/A
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 5.4.0-1009-aws
Operating System: Ubuntu Core 16
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.53GiB
Name: ip-172-31-100-35
ID: POEJ:RK3I:6YKL:4MYP:CMDN:62OA:CIMB:FVHT:766W:4HRS:FT4G:GU7R
Docker Root Dir: /var/snap/docker/common/var-lib-docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Its working fine on eu-central-1a...

beantwortet vor 4 Jahren
0

awshakan, thanks for your attention to my question.

The root cause of the problem was in network setup that was relevant only for our project. We had a VPC peering setup:
VPC1, Requester CIDRs=172.17.0.0/16 -> VPC2, Accepter CIDRs=172.20.0.0/16
EC2 instances, where the issue was reproduced, were located in VPC2 while we were trying to reach them from VPC1.
By default 172.17.0.0/16 is a CIDR used by docker, and when the service was starting - iptables rules were updated, here is a piece of log with strace from dockerd start:

...
DEBU[2020-05-18T07:05:24.901044552Z] /sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2] 
DEBU[2020-05-18T07:05:24.902118925Z] /sbin/iptables, [--wait -t filter -I DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2] 
DEBU[2020-05-18T07:05:24.903236955Z] /sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP] 
DEBU[2020-05-18T07:05:24.904343697Z] /sbin/iptables, [--wait -t filter -I DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP] 

After that access to EC2 instance was lost from VPC1.

Resolution in this case: to use custom CIDR for docker.

beantwortet vor 4 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen