Making sense of the log for the cold start of AWS Lambda

Question

My AWS Lambda measures the time (in milliseconds) of copying two ML models from `/var/task` to `/tmp` and initializing these models during the cold start.  Here is the relevant code (i.e. the code the executes outside of any function):

```py
import json
import subprocess
import time

def ms_now():
    return int(time.time_ns() / 1000000)

class Timer():
    def __init__(self):
        self.start = ms_now()

def stop(self):
        return ms_now() - self.start

def shell_command(command):
    print("Executing: ", command, flush=True)
    result = subprocess.run(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)

if result.returncode == 0:
        print("Success", flush=True)
    else:
        print("Error:", result.stderr, flush=True)

def copy():
    shell_command(f"cp -r /var/task/torch /tmp/")
    shell_command(f"cp -r /var/task/huggingface /tmp/")

# Copying the models
timer = Timer()
copy()
print("Time to copy:", timer.stop(), flush=True)

# Initializing the models
timer = Timer()
from sentence_transformers import SentenceTransformer
from punctuators.models import PunctCapSegModelONNX
model_name = "pcs_en"
model_sentences = PunctCapSegModelONNX.from_pretrained(model_name)
model_embeddings = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
print("Time to initialize:", timer.stop(), flush=True)
```

The first three invocations after updating the Lambda timed out (the timeout is set at 30 seconds). The CloudWatch logs follow. For convenience, I added (manually) a separator line where I believe one invocation ends and another one begins.

```
2023-09-11T16:57:36.315+03:00	Executing: cp -r /var/task/torch /tmp/

2023-09-11T16:57:46.061+03:00	Executing: cp -r /var/task/torch /tmp/

2023-09-11T16:58:16.020+03:00	START RequestId: ac4c4611-7c49-4eb0-94f2-238845e0500a Version: $LATEST

2023-09-11T16:58:16.022+03:00	2023-09-11T13:58:16.022Z ac4c4611-7c49-4eb0-94f2-238845e0500a Task timed out after 30.03 seconds

2023-09-11T16:58:16.022+03:00	END RequestId: ac4c4611-7c49-4eb0-94f2-238845e0500a

2023-09-11T16:58:16.022+03:00	REPORT RequestId: ac4c4611-7c49-4eb0-94f2-238845e0500a Duration: 30032.19 ms Billed Duration: 30000 ms Memory Size: 3000 MB Max Memory Used: 616 MB

---------------------------------------------------------------------------------------------------------------------

2023-09-11T16:58:16.621+03:00	Executing: cp -r /var/task/torch /tmp/

2023-09-11T16:58:26.956+03:00	Executing: cp -r /var/task/torch /tmp/

2023-09-11T16:58:56.915+03:00	START RequestId: 81ce18e4-9ff8-41a4-8f30-a540343c225f Version: $LATEST

2023-09-11T16:58:56.918+03:00	2023-09-11T13:58:56.918Z 81ce18e4-9ff8-41a4-8f30-a540343c225f Task timed out after 30.03 seconds

2023-09-11T16:58:56.918+03:00	END RequestId: 81ce18e4-9ff8-41a4-8f30-a540343c225f

---------------------------------------------------------------------------------------------------------------------

2023-09-11T16:58:56.918+03:00	REPORT RequestId: 81ce18e4-9ff8-41a4-8f30-a540343c225f Duration: 30033.08 ms Billed Duration: 30000 ms Memory Size: 3000 MB Max Memory Used: 760 MB

2023-09-11T16:58:57.230+03:00	Executing: cp -r /var/task/torch /tmp/

2023-09-11T16:59:48.954+03:00	Executing: cp -r /var/task/torch /tmp/

2023-09-11T17:00:12.242+03:00	Success

2023-09-11T17:00:12.242+03:00	Executing: cp -r /var/task/huggingface /tmp/

2023-09-11T17:00:15.354+03:00	Success

2023-09-11T17:00:15.354+03:00	Time to copy: 26400

2023-09-11T17:00:18.915+03:00	START RequestId: aead35f0-9b9e-4859-bce3-e56a6fbd4e87 Version: $LATEST

2023-09-11T17:00:18.924+03:00	2023-09-11T14:00:18.924Z aead35f0-9b9e-4859-bce3-e56a6fbd4e87 Task timed out after 30.04 seconds

2023-09-11T17:00:18.924+03:00	END RequestId: aead35f0-9b9e-4859-bce3-e56a6fbd4e87

2023-09-11T17:00:18.924+03:00	REPORT RequestId: aead35f0-9b9e-4859-bce3-e56a6fbd4e87 Duration: 30039.63 ms Billed Duration: 30000 ms Memory Size: 3000 MB Max Memory Used: 1629 MB

---------------------------------------------------------------------------------------------------------------------

2023-09-11T17:00:19.750+03:00	Executing: cp -r /var/task/torch /tmp/

2023-09-11T17:00:23.783+03:00	Success

2023-09-11T17:00:23.783+03:00	Executing: cp -r /var/task/huggingface /tmp/

2023-09-11T17:00:24.076+03:00	Success

2023-09-11T17:00:24.076+03:00	Time to copy: 4326

2023-09-11T17:00:28.374+03:00	The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.

2023-09-11T17:00:28.377+03:00	0it [00:00, ?it/s] 0it [00:00, ?it/s]

2023-09-11T17:00:59.270+03:00	Executing: cp -r /var/task/torch /tmp/

2023-09-11T17:01:02.233+03:00	Success

2023-09-11T17:01:02.233+03:00	Executing: cp -r /var/task/huggingface /tmp/

2023-09-11T17:01:02.470+03:00	Success

2023-09-11T17:01:02.470+03:00	Time to copy: 3201

2023-09-11T17:01:07.459+03:00	Time to initialize: 4989
```

This log surprises me in many ways:

1. During the very first invocation, the command to copy the first model (i.e. `cp -r /var/task/torch /tmp/`) is executed twice.
2. The first model is only about 90 MB. Nonetheless, copying it during the first invocation took 10 seconds and 20 seconds respectively, which is terribly slow for this amount.
3. If it takes so long to copy the model, how come it succeeded on the fourth attempt with both models (the second one is about 220 MB) taking 3 seconds to copy?
4. During the last invocation, both copy operations are repeated.

What is going on in this log? Also, I would expect the function to succeed on the first invocation and take under 10 seconds, as it consists of copying ~300 MB of two models (which we saw could take only 3 seconds) and initializing the models. How do I make this happen?

P.S. The question at StackOverflow: https://stackoverflow.com/q/77082545/2725810

Answer

Is your function packaged as a container image or a ZIP file? If it is a container image, what probably is happening that causes the copy to take so much time is the way we load the image. To reduce cold start we use some Lazy Loading mechanism, where we only load the blocks we need, when we need them. If you try to copy a lot of data, we need to load the relevant blocks, and I am guessing this is the issue here.

Regarding the double copy that you see: We let you 10 seconds to complete the initialization. If it doesn't complete within those 10 seconds, we abort it, and restart the process again, this time we do not limit the init phase.

To save on copy time I would use a single `cp` command, so for that I would put both folders under some common folder, such as `/var/task/models/torch` and `/var/task/models/huggingface` and then use `cp -r /var/tasks/models /tmp`

Making sense of the log for the cold start of AWS Lambda

Relevant content