Here is my AWS Lambda function:
import json
import time
import pickle
def ms_now():
return int(time.time_ns() / 1000000)
class Timer():
def __init__(self):
self.start = ms_now()
def stop(self):
return ms_now() - self.start
timer = Timer()
from punctuators.models import PunctCapSegModelONNX
model_name = "pcs_en"
model_sentences = PunctCapSegModelONNX.from_pretrained(model_name)
with open('model_embeddings.pkl', 'rb') as file:
model_embeddings = pickle.load(file)
cold_start = True
init_time = timer.stop()
print("Time to initialize:", init_time, flush=True)
def segment_text(texts):
sentences = model_sentences.infer(texts)
sentences = [
[(s, len(model_embeddings.tokenizer.encode(s))) for s in el]
for el in sentences]
return sentences
def get_embeddings(texts):
return model_embeddings.encode(texts)
def compute(body):
command = body['command']
if command == 'ping':
return 'success'
texts = body['texts']
if command == 'embeddings':
result = get_embeddings(texts)
return [el.tolist() for el in result]
if command == 'sentences':
return segment_text(texts)
assert(False)
def lambda_handler(event, context):
global cold_start
global init_time
stats = {'cold_start': cold_start, 'init_time': init_time}
cold_start = False
init_time = 0
stats['started'] = ms_now()
result = compute(event['body'])
stats['finished'] = ms_now()
return {
'statusCode': 200,
'headers': {
'Content-Type': 'application/json'
},
'body': {'result': result, 'stats': stats}
}
This Lambda function, along with the packages and the models (so that those don't need to be downloaded), is deployed as a docker image.
In addition to the timestamps of when the function started and finished (not including the cold start initialization), the response contains the information about whether it was a cold start and how long it took to initialize. I have another function, which invokes this function 15 times in parallel.
The anomaly happens with the first of these parallel invocations. Usually, it takes ~300ms (computed as the difference of the timestamps in the response). But sometimes it takes 900ms and longer (with the same input).
This does not happen due to a cold start, since I have init_time==0
in the response (when a cold start occurs, init_time>6000
). It happens both with command == 'embeddings'
and with command == 'sentences'
.
What could be the explanation for these spikes? With a warm start, what can cause a Lambda function to take much longer than usual?
P.S. The question at SO
Disabling automatic garbage collection with
gc.disable()
helped! But can you come up with an explanation for how come this almost always happened in the first invocation?