Why calling OpenAI API from Lambda is much slower than calling from EC2 t2.micro

0

Using

import openai
openai.ChatCompletion.create(**kwargs)

in my app.

Tested that everything works great on my local machine and EC2.

After deploying the app on Lambda. the above code line takse 5x more time than on my local machine or EC2.

Any suggestion to solve this latency issue?

asked a year ago1269 views
4 Answers
1

AWS Lambda and EC2 have different networking characteristics, which can affect the speed of requests to external APIs like OpenAI. Here are a few possible reasons for the slower response times you're experiencing:

  • Cold Starts: AWS Lambda functions can experience what's known as "cold starts". This occurs when you execute an inactive Lambda function, causing AWS to set up a new instance of the function before it can respond. If your Lambda function is not invoked very frequently, this can increase latency.
  • Network Configuration: AWS Lambda functions operate inside a Virtual Private Cloud (VPC) by default. If your Lambda function needs to access the internet, it must have access to a NAT gateway or a NAT instance, and this could be slowing down your requests.
  • Lambda Configuration: The amount of memory allocated to your Lambda function also proportionally affects CPU power, network bandwidth, and disk I/O capability. If your function is not allocated enough resources, it could be slower.

Here are some things you can do to improve the situation:

  • VPC Configuration: Consider your VPC configuration. If your function doesn't need to access any resources inside your VPC, you could run it outside of the VPC for faster internet access.
  • Adjust Resources: Try adjusting the memory allocation for your Lambda function. Remember that AWS Lambda allocates CPU power linearly in proportion to the amount of memory configured.
  • Optimize Code: Ensure that your code is optimized for the serverless environment. Make sure you're using connection pooling and keep-alive connections with the OpenAI API, as this can help reduce the overhead of establishing a new connection for each request.

Finally, remember to keep an eye on your CloudWatch metrics to see how these changes are affecting your Lambda function's performance.

profile picture
EXPERT
answered a year ago
  • Thanks for sharing the insights.

    1. It does not seem to be an initialization or code start issue. The line of code executes after several other lines. I only observe serious latency for this specific line openai.ChatCompletion.create(**kwargs) which attempts to communicate with the OpenAI API to get a chatbot response.
    2. I checked the "Configuration -> VPC" and it returns "No VPC configuration. This function isn't connected to a VPC."
0

Hello,

Actually, yes, sometimes you can see some speed bumps when calling the OpenAI API from AWS Lambda.

The following can help you find some ways to troubleshoot:

  1. Your Lambda function may have "cold starts" if it isn't often called. In essence, AWS needs some time to set up your function's environment before it can begin to execute. This first "spin-up" may add a lot of latency. You'll encounter this issue less frequently when your Lambda function is used more frequently. By setting up a "ping" or dummy invocation to happen every few minutes, you could minimize this by keeping your Lambda "warm".

  2. Your Lambda function's memory setup may be the cause of any performance issues you are seeing. Keep in mind that the amount of CPU and network resources allocated by Lambda depends on the amount of specified RAM. Therefore, if you've configured your Lambda function with little memory, it can also have a limited amount of CPU and network resources. Increasing the RAM allotment can be tried to see if it helps.

  3. By default, AWS Lambda operates in a VPC, and depending on your architecture, the additional network hop may cause some additional latency. Your EC2 instance may be more directly calling the API in comparison. For the sake of reducing latency, you can try to locate your Lambda function and the resource it needs to access in the same area or even availability zone.

  4. The delay may be caused by the use of several or large Lambda Layers. Maintain as little and as few Lambda Layers as you can.

  5. A large function package can result in longer function initialization. Make sure your function's deployment package contains only what is needed for the code to run.

Hope you can get the speed you're looking for!

AWS
e_mora
answered a year ago
  • Thanks for sharing the insights.

    It does not seem to be an initialization or code start issue. The line of code executes after several other lines. I only observe serious latency for this specific line openai.ChatCompletion.create(**kwargs) which attempts to communicate with the OpenAI API to get a chatbot response. I checked the "Configuration -> VPC" and it returns "No VPC configuration. This function isn't connected to a VPC."

0

I may give you some key points to optimize lambda functions

https://docs.aws.amazon.com/lambda/latest/operatorguide/computing-power.html

Memory is a principal lever for controlling the performance of a Lambda function. Developers can configure the amount of memory allocated to a Lambda function, between 128 MB and 10,240 MB. A higher memory allocation often leads to better performance, especially if the function imports libraries or Lambda layers, or interacts with data loaded from S3 or EFS. The amount of memory also determines the amount of virtual CPU available to a function. Adding more memory proportionally increases the amount of CPU, thus increasing the overall computational power available​

https://docs.aws.amazon.com/lambda/latest/operatorguide/static-initialization.html

Static initialization happens before the handler code starts running in a function. It often includes importing libraries and dependencies, setting up configuration, and initializing connections to other services. A significant contributor of latency before function execution often comes from INIT code.

profile picture
EXPERT
answered a year ago
  • Thanks for sharing the insights.

    It does not seem to be an initialization or code start issue. The line of code executes after several other lines. I only observe serious latency for this specific line openai.ChatCompletion.create(**kwargs) which attempts to communicate with the OpenAI API to get a chatbot response. I checked the "Configuration -> VPC" and it returns "No VPC configuration. This function isn't connected to a VPC."

0

seems to be AWS people destroy their own company It is impossible to find 0 to end instructions / video how to create use case project for ML training and inference for sageMaker / ec2/lambda some people do it for free https://www.youtube.com/playlist?list=PLZoTAELRMXVONh5mHrXowH6-dgyWoC_Ew (but unfortunately quality is not good enough )

but AWS videos to not describe full development, like https://www.youtube.com/playlist?list=PLhr1KZpdzukcOr_6j_zmSrvYnLUtgqsZz just some project pieces

I give some obvious educational scenario Sagemaker create model from scratch from training data ( low cost ) => train on many workers for hyperparameters tuning . Sagemaker save to model S3 1 Fully external from AWS my computer python (no password actually and python code from web) -> send request to End Point ( 20 per second ) -> send request data to sagemaker -> sagemaker make prediction ec2 -> sagemaker send prediction to python

AWS high managers: use managers who at least understand little bit in development, since it is impossible technical knowledge development can not understand how destructive not to provide clear user manuals how to use AWS products..

answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions