Docker installation makes EC2 instance unresponsive

0

Steps to reproduce:

  1. run EC2 instance using
  • region - Frankfurt
  • ami-05c26ae4789875080 (Canonical, Ubuntu, 20.04 LTS, amd64 focal), ami-0ac05733838eabc06 (Canonical, Ubuntu, 18.04 LTS, amd64 bionic), or ami-04cf43aca3e6f3de3 (CentOS Linux 7 x86_64 HVM EBS ENA 1901_01)
  • AZ - eu-central-1a
  • type - m5, c5 (but seems like it doesn't matter)
  1. follow official docker engine installation guide https://docs.docker.com/engine/install/ubuntu/
  2. installation for ubutnu stucks after step
Created symlink /etc/systemd/system/sockets.target.wants/docker.socket → /lib/systemd/system/docker.socket
  1. EC2 instance becomes unresponsive, ssh drops with timeout. On centos - installation completes, but after starting docker behavior is the same.
    Reboot doesn't help.
    Similar with installation via snap (on ubuntu).

If you change AZ to eu-central-1b, issue is not reproduced, installation is successful.

Anyone knows what could be a root cause?

Edited by: bberenice on May 15, 2020 12:10 AM

Edited by: bberenice on May 16, 2020 4:35 AM

posta 4 anni fa971 visualizzazioni
2 Risposte
0

Dear bberenice,

Just for test, I've run same type of instance/AMI on the same availability zone.

 aws ec2 run-instances \
    --image-id ami-05c26ae4789875080 \
    --count 1 \
    --instance-type m5a.large \
    --key-name awshakantestkey01  \
    --subnet-id subnet-3a266c51 \
    --security-group-ids sg-0cb88d91d7b4ff0d4 \
    --placement Availabilityzone=eu-central-1a

Instance running...

ssh -i "awshakantestkey01.pem" ubuntu@ec2-xx-xxx-xxx-xx.eu-central-1.compute.amazonaws.com

    Welcome to Ubuntu 20.04 LTS (GNU/Linux 5.4.0-1009-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Sat May 16 16:42:49 UTC 2020

  System load:  0.08              Processes:             121
  Usage of /:   16.2% of 7.69GB   Users logged in:       0
  Memory usage: 2%                IPv4 address for ens5: 172.31.100.35
  Swap usage:   0%

0 updates can be installed immediately.
0 of these updates are security updates.


The list of available updates is more than a week old.
To check for new updates run: sudo apt update


The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

Docker Installed

ubuntu@ip-172-31-100-35:~$ sudo snap install docker
docker 18.09.9 from Canonical✓ installed


buntu@ip-172-31-100-35:~$ sudo docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 18.09.9
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: N/A
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 5.4.0-1009-aws
Operating System: Ubuntu Core 16
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.53GiB
Name: ip-172-31-100-35
ID: POEJ:RK3I:6YKL:4MYP:CMDN:62OA:CIMB:FVHT:766W:4HRS:FT4G:GU7R
Docker Root Dir: /var/snap/docker/common/var-lib-docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Its working fine on eu-central-1a...

con risposta 4 anni fa
0

awshakan, thanks for your attention to my question.

The root cause of the problem was in network setup that was relevant only for our project. We had a VPC peering setup:
VPC1, Requester CIDRs=172.17.0.0/16 -> VPC2, Accepter CIDRs=172.20.0.0/16
EC2 instances, where the issue was reproduced, were located in VPC2 while we were trying to reach them from VPC1.
By default 172.17.0.0/16 is a CIDR used by docker, and when the service was starting - iptables rules were updated, here is a piece of log with strace from dockerd start:

...
DEBU[2020-05-18T07:05:24.901044552Z] /sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2] 
DEBU[2020-05-18T07:05:24.902118925Z] /sbin/iptables, [--wait -t filter -I DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2] 
DEBU[2020-05-18T07:05:24.903236955Z] /sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP] 
DEBU[2020-05-18T07:05:24.904343697Z] /sbin/iptables, [--wait -t filter -I DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP] 

After that access to EC2 instance was lost from VPC1.

Resolution in this case: to use custom CIDR for docker.

con risposta 4 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande