Random deployment failure in Elastic Beanstalk Docker running on 64bit Amazon Linux 2/3.5.4 with 100% EC2 usage?

0

We have a couple of Beanstalk applications with multiple environments. The beanstalk applications are based on Docker running on 64bit Amazon Linux 2/3.5.4 and runs Django inside docker. Most of the environment uses a t2.micro EC2 which if perfect for our scenario.

Lately we have been seeing a multiple random (not event without a config/code change) elastic beanstalk deployment failure where the deployment runs for over 10 minutes and fails with log below:

Error Log from beanstalk

Upon investigation, it was found that every time there is a deployment failure, we see unprecedented rise in CPU usage, usually up to 100% (begins to rise after deployment initiation). Our baseline CPU usage is generally about 30% and this happens even if it's a dry deployment with no any config or code change. The problem is resolved on itself after 30–40 minutes. If we try to redeploy before that same issue is seen even after EC2/service restart.

Syslog of EC2 gives me following errors in CloudFormation, possibly during deployment).

[eb-cfn-init]: 2023-02-20 10:07:51,762 [INFO] -----------------------Build complete-----------------------
[eb-cfn-init]: [2023-02-20T10:09:24.623Z] Tailing /var/log/cfn-hup.log
[eb-cfn-init]: ******************* cfn-hup taillog *******************
[eb-cfn-init]:   File "/usr/lib64/python3.7/http/client.py", line 1036, in _send_output
[eb-cfn-init]:     self.send(msg)
[eb-cfn-init]:   File "/usr/lib64/python3.7/http/client.py", line 976, in send
[eb-cfn-init]:     self.connect()
[eb-cfn-init]:   File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/packages/urllib3/connection.py", line 200, in connect
[eb-cfn-init]:     conn = self._new_conn()
[eb-cfn-init]:   File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/packages/urllib3/connection.py", line 182, in _new_conn
[eb-cfn-init]:     self, "Failed to establish a new connection: %s" % e
[eb-cfn-init]: cfnbootstrap.packages.requests.packages.urllib3.exceptions.NewConnectionError: <cfnbootstrap.packages.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b60e24990>: Failed to establish a new connection: [Errno 101] Network is unreachable
[eb-cfn-init]: During handling of the above exception, another exception occurred:
[eb-cfn-init]: Traceback (most recent call last):
[eb-cfn-init]:   File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/adapters.py", line 499, in send
[eb-cfn-init]:     timeout=timeout,
[eb-cfn-init]:   File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/packages/urllib3/connectionpool.py", line 756, in urlopen
[eb-cfn-init]:     method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
[eb-cfn-init]:   File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/packages/urllib3/util/retry.py", line 573, in increment
[eb-cfn-init]:     raise MaxRetryError(_pool, url, error or ResponseError(cause))
[eb-cfn-init]: cfnbootstrap.packages.requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /latest/api/token (Caused by NewConnectionError('<cfnbootstrap.packages.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b60e24990>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
[eb-cfn-init]: During handling of the above exception, another exception occurred:
[eb-cfn-init]: Traceback (most recent call last):
[eb-cfn-init]:   File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 193, in _retry
[eb-cfn-init]:     return f(*args, **kwargs)
[eb-cfn-init]:   File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 329, in _fetch_instance_id
[eb-cfn-init]:     token = _get_session_token()
[eb-cfn-init]:   File "/usr/lib/python3.7/site-packages/cfnbootstrap/util.py", line 293, in _get_session_token
[eb-cfn-init]:     timeout=REQUEST_TIMEOUT)
[eb-cfn-init]:   File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/api.py", line 130, in put
[eb-cfn-init]:     return request("put", url, data=data, **kwargs)
[eb-cfn-init]:   File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/api.py", line 59, in request
[eb-cfn-init]:     return session.request(method=method, url=url, **kwargs)
[eb-cfn-init]:   File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/sessions.py", line 586, in request
[eb-cfn-init]:     resp = self.send(prep, **send_kwargs)
[eb-cfn-init]:   File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/sessions.py", line 700, in send
[eb-cfn-init]:     r = adapter.send(request, **kwargs)
[eb-cfn-init]:   File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/adapters.py", line 565, in send
[eb-cfn-init]:     raise ConnectionError(e, request=request)
[eb-cfn-init]: cfnbootstrap.packages.requests.exceptions.ConnectionError: HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /latest/api/token (Caused by NewConnectionError('<cfnbootstrap.packages.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b60e24990>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
[eb-cfn-init]: 2023-02-20 10:07:15,409 [DEBUG] Sleeping for 2.047560 seconds before retrying

My take on the issue.

Sometime during the deployment (internally CloudFormation is doing the work) the egress network (public internet) becomes unavailable, mostly during the IMDS (metadata service) polling phase, and internally CloudFormation keeps on retrying to get the metadata resulting in a spike in CPU usage. Thus, the EC2 becomes unresponsive due to 100% CPU usage caused due to retry (maybe) and deployment fails. The main culprit here is the network availability which is a total weird because this resolves on its own the networking is configured correctly.

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions