AWS lambda function times out invoked from aws java sdk V2

0

I have created a AWS lambda function (written in python) that reads a tar.gz file from one S3 bucket, unzips and untars it and writes the extracted files to another S3 bucket. Tar inside the GZ is of >1GB size so lambda takes more time to complete the task. I invoke this lambda function from a java client. I am using AWS SDK V2 for Java (software.amazon.awssdk.*), and using Lambda sync client software.amazon.awssdk.services.lambda.LambdaClient.

Though the lambda invocation works (lambdaClient.invoke(invokeRequest)), but it fails with "Read timed out" error. In the background (in AWS) the lambda completes its execution after sometime.

Following is the lambda client bean creation code.

LambdaClient lambdaClient = LambdaClient.builder()
        		.credentialsProvider(awsCredentialsProvider)
                .region(Region.US_EAST_1)
                .overrideConfiguration(ClientOverrideConfiguration.builder()
                		.apiCallTimeout(Duration.ofMinutes(30))
                		.apiCallAttemptTimeout(Duration.ofMinutes(30))
                        .build()
                )
                .build();

Following is the lambda invocation code.

//This is a user defined pojo object that maps to lambda input json payload
UntarLambdaPayload untarLambdaPayload = UntarLambdaPayload.builder()
                .sourceBucket(lambdaProps.getSourceBucket())
                .destinationBucket(lambdaProps.getDestinationBucket())
                .sourceKey("myTarFile.tar.gz")
                .build();

ObjectMapper mapper = new ObjectMapper();
String jsonRequest = mapper.writeValueAsString(untarLambdaPayload);
SdkBytes payload = SdkBytes.fromUtf8String(jsonRequest);
            
InvokeRequest invokeRequest = InvokeRequest.builder()
                    .functionName(lambdaProps.getFunctionName())
                    .overrideConfiguration(AwsRequestOverrideConfiguration.builder()
                            .apiCallTimeout(Duration.ofMinutes(30))
                            .apiCallAttemptTimeout(Duration.ofMinutes(30)).build())
                    .payload(payload)
                    .build();
            
InvokeResponse res = lambdaClient.invoke(invokeRequest);

And I am getting the below exception.

software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Read timed out
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:102)
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:204)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:83)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
	at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:167)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:175)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
	at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
	at software.amazon.awssdk.services.lambda.DefaultLambdaClient.invoke(DefaultLambdaClient.java:2355)

My code's further logic depends on the successful completion of the lambda function. If lambda is timed out then code cannot proceed with processing the untarred files in S3 bucket#2.

I tried overrideConfiguration with apiCallTimeout and apiCallAttemptTimeout in InvokeRequest (as well as in Lambda Client), but it did not work. I am going to do research on LambdaClient waiter functionality for which I haven't got any help so far on how to use it with Lambda.

How can I make the lambdaClient.invoke(invokeRequest) wait until the lambda running in AWS completes its execution?

asked 2 years ago3808 views
2 Answers
1
Accepted Answer

Where does the client side of your code runs at (the app that is using Java SDK)? What might be happening is you're hitting the TCP Keepalive timeout. There's no traffic being sent either way on your TCP connection When SDK waits for Lambda to reply. Network appliances that you might have along the way might be closing idle connection after some period of time. For example if your app runs in a VPC environment and using NAT Gateway or VPC Endpoints for outbound connections to Lambda, the idle connections will be closed after 350 seconds. In this case the behavior will be consistent with what you're observing - the Lambda side will continue to run, and the client side will eventually timeout.

If above description fits your case, you can try remediating it by doing two steps - (1) reducing the OS level TCP Keepalive time (2) enabling TCP Keepalive in Java SDK. I'm going to add some details on both steps below.

NOTE: Those steps are using advanced techniques by changing the default behavior for OS-level TCP keepalive settings and App-level HTTP client configuration. Use below description as a reference only, and make sure you understand what you're doing if you decide to apply it to real environments. This is a good article explaining TCP keepalive.

  1. By default it is common for a Linux environment to be configured with 7200 seconds (2 hours) of TCP Keepalive interval, which is obviously too high for your scenario. You can see the configuration by running
cat /proc/sys/net/ipv4/tcp_keepalive_time
cat /proc/sys/net/ipv4/tcp_keepalive_intvl

You can update these settings by adding below properties to /etc/sysctl.conf and rebooting your system. You might want to use different values. In case you do have NAT Gateway or VPC Endpoint on your path make sure your keepalive is below 350 seconds. Other network appliances might have different timeouts. Below config means wait for 120 seconds before starting to send keepalive probes, and then send them with 30 second intervals.

net.ipv4.tcp_keepalive_time=120
net.ipv4.tcp_keepalive_intvl=30

After reboot run the same cat commands to make sure new settings were applied.

  1. Build a custom ApacheHttpClient, enable TCP Keepalive, and use it when building your Lambda client. See below for a reference.
        ApacheHttpClient.Builder apacheHttpClientBuilder = ApacheHttpClient.builder();
        apacheHttpClientBuilder.connectionMaxIdleTime(Duration.ofSeconds(900));
        apacheHttpClientBuilder.connectionTimeToLive(Duration.ofSeconds(900));
        apacheHttpClientBuilder.socketTimeout(Duration.ofSeconds(900));
        apacheHttpClientBuilder.tcpKeepAlive(true);
        SdkHttpClient sdkHttpClient = apacheHttpClientBuilder.build();

        RetryPolicy retryPolicy = RetryPolicy.builder().numRetries(0).build();
        ClientOverrideConfiguration clientOverrideConfiguration = ClientOverrideConfiguration.builder()
                .apiCallAttemptTimeout(Duration.ofSeconds(900))
                .apiCallTimeout(Duration.ofSeconds(900))
                .retryPolicy(retryPolicy)
                .build();

        LambdaClientBuilder lambdaClientBuilder = LambdaClient.builder();

        lambdaClientBuilder.overrideConfiguration(clientOverrideConfiguration);
        lambdaClientBuilder.httpClient(sdkHttpClient);

        LambdaClient lambdaClient = lambdaClientBuilder.build();
        InvokeRequest invokeRequest = InvokeRequest.builder().functionName(FUNCTION_NAME).build();
        InvokeResponse invokeResponse = lambdaClient.invoke(invokeRequest);

ApacheHttpClient class is coming from https://mvnrepository.com/artifact/software.amazon.awssdk/apache-client

AWS
answered 2 years ago
profile picture
EXPERT
reviewed 10 months ago
  • Thank a lot!! Spot on!! Using above code with ApacheHttpClient fixed the timeout issue.

    It is a POC, and currently I am running it in local. Yes, the client side of my code runs at the app that uses AWS Java SDK V2. When I will deploy it to my company's Java8/Openshift environment then I will test there. If it does not work as is then will apply the keepalive properties to /etc/sysctl.conf and see if that works.

    Thanks again!!

  • @AWS-User-2361124 Glad it worked for you! If you don't mind, can you please share the tcp keepalive values on your system? In my previous tests I couldn't get this approach to work without updating /etc/sysctl.conf on Linux-based OS, so I'm interested to see what settings do you have and what OS you're using. Also, can you please keep me posted with outcome of running the app on the OpenShift environment? Kubernetes-based environments require some additional tweaking in order to be able to update TCP keepalive. I have documented the steps for EKS, but would like to learn your OpenShift configuration.

  • I've run this on Windows so far. I don't know where to see tcp keepalive in windows. I will keep you posted when I will deploy it on linux/openshift

0

Hi, When reading your question, I am unsure of how long you are waiting prior to receiving the timeout error. Is this error raised after several minutes, or does this get returned immediately? The max timeout for Lambda is 15 minutes. This maximum cannot be overridden, and if you are getting this error after 15 minutes, then that helps me to understand if this error is valid or not. If this is a valid error, meaning that you have exceeded the 15 minute timeout, then I suggest a different approach to solving this workload. First, you can try increasing the size of your lambda runtime. More CPUs and more memory may be the key to completing the operation within the time limit. If that does not solve the timeout, consider breaking the operation into separate activities, and have a separate lambda that runs separate operations. You can tie these different lambdas together using Step Functions. While this will take some work to separate the operations and create the Step Function job, the overall cost difference will be minor, since all of these operations are serverless. If these suggestions still do not solve this, then you will need to have your workload run in a persistent state that does not timeout. EC2 instances will meet this need.

AWS
Byron_G
answered 2 years ago
  • Thank you for the answer @Byron_G. It waits for ~120 seconds before timing out. AWS lambda timeout is set as 15 mins. The above suggestion given by Anton worked.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions