Where does the client side of your code runs at (the app that is using Java SDK)? What might be happening is you're hitting the TCP Keepalive timeout. There's no traffic being sent either way on your TCP connection When SDK waits for Lambda to reply. Network appliances that you might have along the way might be closing idle connection after some period of time. For example if your app runs in a VPC environment and using NAT Gateway or VPC Endpoints for outbound connections to Lambda, the idle connections will be closed after 350 seconds. In this case the behavior will be consistent with what you're observing - the Lambda side will continue to run, and the client side will eventually timeout.
If above description fits your case, you can try remediating it by doing two steps - (1) reducing the OS level TCP Keepalive time (2) enabling TCP Keepalive in Java SDK. I'm going to add some details on both steps below.
NOTE: Those steps are using advanced techniques by changing the default behavior for OS-level TCP keepalive settings and App-level HTTP client configuration. Use below description as a reference only, and make sure you understand what you're doing if you decide to apply it to real environments. This is a good article explaining TCP keepalive.
- By default it is common for a Linux environment to be configured with 7200 seconds (2 hours) of TCP Keepalive interval, which is obviously too high for your scenario. You can see the configuration by running
cat /proc/sys/net/ipv4/tcp_keepalive_time cat /proc/sys/net/ipv4/tcp_keepalive_intvl
You can update these settings by adding below properties to
/etc/sysctl.conf and rebooting your system. You might want to use different values. In case you do have NAT Gateway or VPC Endpoint on your path make sure your keepalive is below 350 seconds. Other network appliances might have different timeouts. Below config means wait for 120 seconds before starting to send keepalive probes, and then send them with 30 second intervals.
After reboot run the same
cat commands to make sure new settings were applied.
- Build a custom ApacheHttpClient, enable TCP Keepalive, and use it when building your Lambda client. See below for a reference.
ApacheHttpClient.Builder apacheHttpClientBuilder = ApacheHttpClient.builder(); apacheHttpClientBuilder.connectionMaxIdleTime(Duration.ofSeconds(900)); apacheHttpClientBuilder.connectionTimeToLive(Duration.ofSeconds(900)); apacheHttpClientBuilder.socketTimeout(Duration.ofSeconds(900)); apacheHttpClientBuilder.tcpKeepAlive(true); SdkHttpClient sdkHttpClient = apacheHttpClientBuilder.build(); RetryPolicy retryPolicy = RetryPolicy.builder().numRetries(0).build(); ClientOverrideConfiguration clientOverrideConfiguration = ClientOverrideConfiguration.builder() .apiCallAttemptTimeout(Duration.ofSeconds(900)) .apiCallTimeout(Duration.ofSeconds(900)) .retryPolicy(retryPolicy) .build(); LambdaClientBuilder lambdaClientBuilder = LambdaClient.builder(); lambdaClientBuilder.overrideConfiguration(clientOverrideConfiguration); lambdaClientBuilder.httpClient(sdkHttpClient); LambdaClient lambdaClient = lambdaClientBuilder.build(); InvokeRequest invokeRequest = InvokeRequest.builder().functionName(FUNCTION_NAME).build(); InvokeResponse invokeResponse = lambdaClient.invoke(invokeRequest);
ApacheHttpClient class is coming from https://mvnrepository.com/artifact/software.amazon.awssdk/apache-client
Hi, When reading your question, I am unsure of how long you are waiting prior to receiving the timeout error. Is this error raised after several minutes, or does this get returned immediately? The max timeout for Lambda is 15 minutes. This maximum cannot be overridden, and if you are getting this error after 15 minutes, then that helps me to understand if this error is valid or not. If this is a valid error, meaning that you have exceeded the 15 minute timeout, then I suggest a different approach to solving this workload. First, you can try increasing the size of your lambda runtime. More CPUs and more memory may be the key to completing the operation within the time limit. If that does not solve the timeout, consider breaking the operation into separate activities, and have a separate lambda that runs separate operations. You can tie these different lambdas together using Step Functions. While this will take some work to separate the operations and create the Step Function job, the overall cost difference will be minor, since all of these operations are serverless. If these suggestions still do not solve this, then you will need to have your workload run in a persistent state that does not timeout. EC2 instances will meet this need.
- AWS OFFICIALUpdated 7 months ago
- How do I troubleshoot "ClassNotFoundException" and "NoSuchMethodError" errors from a Java Lambda function?AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- EXPERTpublished a month ago
- EXPERTpublished 8 months ago