How to troubleshoot Lightsail container taking too long to deploy?

0

Hi

I'm trying to deploy a container service containing 2 images, a Python-slim app container running a Django Application at about 800MB and a base Redis container at about 138MB.

It runs fine on my local dev environment. Below are the logs. I've tried setting my Docker log level to debug as suggested in another thread but I don't get any further detailed logs. I've also tried increasing the container service capacity to Medium and I still have the same problem

[13/Oct/2023:08:42:10] 0 static files copied, 207 unmodified.
[13/Oct/2023:08:42:11] upload logs omitted.
[13/Oct/2023:08:42:11] Upload complete.
[13/Oct/2023:08:42:11] Importing data from fixtures.
[13/Oct/2023:08:42:12] No changes detected.
[13/Oct/2023:08:42:13] Operations to perform:
[13/Oct/2023:08:42:13] Apply all migrations: accounts, admin, auth, contenttypes, courses, schools, sessions, subscription. [13/Oct/2023:08:42:13] Running migrations:
[13/Oct/2023:08:42:13] No migrations to apply.
[13/Oct/2023:08:42:15] Installed 1 object(s) from 1 fixture(s).
[13/Oct/2023:08:42:16] Installed 42 object(s) from 1 fixture(s).
[13/Oct/2023:08:42:17] Installed 3 object(s) from 1 fixture(s).
[13/Oct/2023:08:43:13] Installed 50254 object(s) from 1 fixture(s).
[13/Oct/2023:08:43:13] Data imported successfully!
[13/Oct/2023:08:43:13] Starting Gunicorn server...
[13/Oct/2023:08:43:13] [2023-10-13 08:43:13 +0000] [31] [INFO] Starting gunicorn 21.2.0.
[13/Oct/2023:08:43:13] [2023-10-13 08:43:13 +0000] [31] [INFO] Listening at: http://0.0.0.0:80 (31).
[13/Oct/2023:08:43:13] [2023-10-13 08:43:13 +0000] [31] [INFO] Using worker: sync.
[13/Oct/2023:08:43:13] [2023-10-13 08:43:13 +0000] [32] [INFO] Booting worker with pid: 32.
[13/Oct/2023:08:46:16] [deployment:4] Took too long.

  • Can you see what this command gives you? for example, is publicEndpoint.containerPort the same the port your gunicorn server is listening on? In your case it should be 80.

    aws lightsail get-container-service-deployments --service-name your-service-name --query 'deployments[?version == `4`].publicEndpoint'
    
  • It seems, also, that your server spends at least a minute between [13/Oct/2023:08:42:10] 0 static files copied... and [13/Oct/2023:08:43:13] Data imported successfully!

    So if your compute nodes need to be restarted due to some malfunction or a new deployment, you'd be spending similar amount of time again? What if you're running with scale > 1... does it mean this data import will be done in each compute node?

    If the image takes too long to start, another idea is to attempt these slow import steps as part of docker build .... The image you'll get, will start faster.

asked 7 months ago192 views
1 Answer
0

Have you figured out the issue? This post has some useful troubleshooting steps you can refer to

https://repost.aws/questions/QUTuGsgLj_QrmydWGNC1Ex9A/understanding-failed-deployments

AWS
answered 7 months ago
  • Thanks. I'm afraid I haven't. I've looked through the post you provided but I don't know how to adapt it to my particular scenario.

    I'm running a Django app using Gunicorn. I run the following at the end of my DockerFile.

    echo "Starting Gunicorn server..."
    which gunicorn || echo "Gunicorn not found!"
    gunicorn -b :80 --timeout 240 --chdir /app --log-level=debug --access-logfile '-' --error-logfile '-' mathsroot.wsgi:application.

    The debug level logs on Lightsail look like this. It seems to go fin up until it's starting the Worker and then something prevents it from continuing. On this particular attempt, there is only 22 seconds between the last Gunicorn log and Lightsail stating that it took too long. I've set the timeoutInterval of the container to the maximum 60 seconds allowed by Lightsail.

  • [18/Oct/2023:08:53:18] Starting Gunicorn server...
    [18/Oct/2023:08:53:18] /venv/bin/gunicorn.
    [18/Oct/2023:08:53:18] [2023-10-18 08:53:18 +0000] [446] [DEBUG] Current configuration:
    [18/Oct/2023:08:53:18] config: ./gunicorn.conf.py.
    [18/Oct/2023:08:53:18] wsgi_app: None.
    [18/Oct/2023:08:53:18] bind: [':80'].
    [18/Oct/2023:08:53:18] backlog: 2048.
    [18/Oct/2023:08:53:18] workers: 1.
    [18/Oct/2023:08:53:18] worker_class: sync.
    [18/Oct/2023:08:53:18] threads: 1.
    [18/Oct/2023:08:53:18] worker_connections: 1000.
    [18/Oct/2023:08:53:18] max_requests: 0.
    [18/Oct/2023:08:53:18] max_requests_jitter: 0.
    [18/Oct/2023:08:53:18] timeout: 240.
    [18/Oct/2023:08:53:18] graceful_timeout: 30.
    [18/Oct/2023:08:53:18] keepalive: 2.
    [18/Oct/2023:08:53:18] limit_request_line: 4094.
    [18/Oct/2023:08:53:18] limit_request_fields: 100.
    [18/Oct/2023:08:53:18] limit_request_field_size: 8190.
    [18/Oct/2023:08:53:18] reload: False.
    [18/Oct/2023:08:53:18] reload_engine: auto.
    [18/Oct/2023:08:53:18] reload_extra_files: [].
    [18/Oct/2023:08:53:18] spew: False.
    [18/Oct/2023:08:53:18] check_config: False.
    [18/Oct/2023:08:53:18] print_config: False.
    [18/Oct/2023:08:53:18] preload_app: False.
    [18/Oct/2023:08:53:18] sendfile: None.
    [18/Oct/2023:08:53:18] reuse_port: False.
    [18/Oct/2023:08:53:18] chdir: /app.
    [18/Oct/2023:08:53:18] daemon: False.
    [18/Oct/2023:08:53:18] raw_env: [].
    [18/Oct/2023:08:53:18] pidfile: None.
    [18/Oct/2023:08:53:18] worker_tmp_dir:

  • [18/Oct/2023:08:53:18] worker_tmp_dir: None.
    [18/Oct/2023:08:53:18] user: 0.
    [18/Oct/2023:08:53:18] group: 0.
    [18/Oct/2023:08:53:18] umask: 0.
    [18/Oct/2023:08:53:18] initgroups: False.
    [18/Oct/2023:08:53:18] tmp_upload_dir: None.
    [18/Oct/2023:08:53:18] secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}.
    [18/Oct/2023:08:53:18] forwarded_allow_ips: ['127.0.0.1'].
    [18/Oct/2023:08:53:18] accesslog: -
    [18/Oct/2023:08:53:18] disable_redirect_access_to_syslog: False.
    [18/Oct/2023:08:53:18] access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s".
    [18/Oct/2023:08:53:18] errorlog: -
    [18/Oct/2023:08:53:18] loglevel: debug.
    [18/Oct/2023:08:53:18] capture_output: False.
    [18/Oct/2023:08:53:18] logger_class: gunicorn.glogging.Logger.
    [18/Oct/2023:08:53:18] logconfig: None.
    [18/Oct/2023:08:53:18] logconfig_dict: {}.
    [18/Oct/2023:08:53:18] logconfig_json: None.
    [18/Oct/2023:08:53:18] syslog_addr: udp://localhost:514.
    [18/Oct/2023:08:53:18] syslog: False.
    [18/Oct/2023:08:53:18] syslog_prefix: None.
    [18/Oct/2023:08:53:18] syslog_facility: user.
    [18/Oct/2023:08:53:18] enable_stdio_inheritance: False.
    [18/Oct/2023:08:53:18] statsd_host: None.
    [18/Oct/2023:08:53:18] dogstatsd_tags:
    [18/Oct/2023:08:53:18] statsd_prefix:
    [18/Oct/2023:08:53:18] proc_name: None.
    [18/Oct/2023:08:53:18] default_proc_name: mathsroot.wsgi:application.

  • [18/Oct/2023:08:53:18] pythonpath: None.
    [18/Oct/2023:08:53:18] paste: None.
    [18/Oct/2023:08:53:18] on_starting: <function OnStarting.on_starting at 0x7f5d2f273740>.
    [18/Oct/2023:08:53:18] on_reload: <function OnReload.on_reload at 0x7f5d2f273880>.
    [18/Oct/2023:08:53:18] when_ready: <function WhenReady.when_ready at 0x7f5d2f2739c0>.
    [18/Oct/2023:08:53:18] pre_fork: <function Prefork.pre_fork at 0x7f5d2f273b00>.
    [18/Oct/2023:08:53:18] post_fork: <function Postfork.post_fork at 0x7f5d2f273c40>.
    [18/Oct/2023:08:53:18] post_worker_init: <function PostWorkerInit.post_worker_init at 0x7f5d2f273d80>.
    [18/Oct/2023:08:53:18] worker_int: <function WorkerInt.worker_int at 0x7f5d2f273ec0>.
    [18/Oct/2023:08:53:18] worker_abort: <function WorkerAbort.worker_abort at 0x7f5d2ead0040>.
    [18/Oct/2023:08:53:18] pre_exec: <function PreExec.pre_exec at 0x7f5d2ead0180>.
    [18/Oct/2023:08:53:18] pre_request: <function PreRequest.pre_request at 0x7f5d2ead02c0>.
    [18/Oct/2023:08:53:18] post_request: <function PostRequest.post_request at 0x7f5d2ead0360>.
    [18/Oct/2023:08:53:18] child_exit: <function ChildExit.child_exit at 0x7f5d2ead04a0>.
    [18/Oct/2023:08:53:18] worker_exit: <function WorkerExit.worker_exit at 0x7f5d2ead05e0>.
    [18/Oct/2023:08:53:18] nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7f5d2ead0720>.
    [18/Oct/2023:08:53:18] on_exit: <function OnExit.on_exit at 0x7f5d2ead0860>.

  • [18/Oct/2023:08:53:18] ssl_context: <function NewSSLContext.ssl_context at 0x7f5d2ead0a40>.
    [18/Oct/2023:08:53:18] proxy_protocol: False.
    [18/Oct/2023:08:53:18] proxy_allow_ips: ['127.0.0.1'].
    [18/Oct/2023:08:53:18] keyfile: None.
    [18/Oct/2023:08:53:18] certfile: None.
    [18/Oct/2023:08:53:18] ssl_version: 2.
    [18/Oct/2023:08:53:18] cert_reqs: 0.
    [18/Oct/2023:08:53:18] ca_certs: None.
    [18/Oct/2023:08:53:18] suppress_ragged_eofs: True.
    [18/Oct/2023:08:53:18] do_handshake_on_connect: False.
    [18/Oct/2023:08:53:18] ciphers: None.
    [18/Oct/2023:08:53:18] raw_paste_global_conf: [].
    [18/Oct/2023:08:53:18] strip_header_spaces: False.
    [18/Oct/2023:08:53:18] [2023-10-18 08:53:18 +0000] [446] [INFO] Starting gunicorn 21.2.0.
    [18/Oct/2023:08:53:18] [2023-10-18 08:53:18 +0000] [446] [DEBUG] Arbiter booted.
    [18/Oct/2023:08:53:18] [2023-10-18 08:53:18 +0000] [446] [INFO] Listening at: http://0.0.0.0:80 (446).
    [18/Oct/2023:08:53:18] [2023-10-18 08:53:18 +0000] [446] [INFO] Using worker: sync.
    [18/Oct/2023:08:53:18] [2023-10-18 08:53:18 +0000] [447] [INFO] Booting worker with pid: 447.
    [18/Oct/2023:08:53:18] [2023-10-18 08:53:18 +0000] [446] [DEBUG] 1 workers.
    [18/Oct/2023:08:53:40] [deployment:1] Took too long.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions