How to avoid network timeout issues when invoking long running Lambda functions from .NET6+ applications on Linux platforms

4 minute read
Content level: Advanced
0

AWS Lambda functions can run for up to 15 minutes, if the configured function timeout is set accordingly. When invoking a function asynchronously using the function invoke URL, your client might have issues receiving the function results even though the function executes successfully and emits its results.

AWS Lambda function invocation methods

AWS Lambda functions can run for up to 15 minutes, if the configured function timeout is set accordingly. You can invoke a function in two ways:

  • calling the Invoke API synchronously (and wait for the response), or asynchronously
  • using the function URL, only synchronously

Calling the Invoke API through the AWS SDK for .NET

When you call the Invoke API, you use the AWS SDK for .NET AmazonLambdaClient class and its Invoke method. The Lambda client class extends the AmazonServiceClient class and enables TCP keep-alive in the client config, which is off by default for all other services. The default configuration will instruct the lambda client to send TCP keep-alive packets by default every 15 seconds after the connection has been idle for 5 minutes. This works on .NET Framework target of the SDK, but on .NET Standard/Core it won't.

Invoking the function through the function URL endpoint

To overcome this, you can create a function URL and invoke the function through it. When you invoke the function through its function URL, you can make use of the .NET system HttpClient class to make the request. By default, the SocketsHttpHandler handler used by the HttpClient class has TCP keep-alive turned off. This will cause issues when the TCP timeout for the connection is reached and a device in the network path drops the connection between your client and the function URL endpoint. Your code might throw an exception similar to the one below:

Unhandled exception. System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 900 seconds elapsing.
 ---> System.TimeoutException: The operation was canceled.
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
 ---> System.Net.Sockets.SocketException (125): Operation canceled

In the example above I configured the HttpClient.Timeout property to 15 minutes to align to the function's execution timeout. The function took 351 seconds to execute and returned correctly but the HTTP client in my .NET code timed out after 15 minutes without receiving the response. In my case this was caused by the VPC NAT Gateway TCP timeout which drops idle connections after 350 seconds and returns an RST packet to any resources behind the NAT gateway that attempt to continue the connection.

Resolution

To keep the connection from being forcibly dropped for inactivity, you need to enable TCP keep-alive on the SocketsHttpHandler used by the HttpClient. Unfortunately, the example provided in the .NET 6 documentation throws a PlatformNotSupportedException exception when compiled and executed on Linux systems.

Unhandled exception. System.Net.Http.HttpRequestException: Sockets on this platform are invalid for use after a failed connection attempt.
---> System.PlatformNotSupportedException: Sockets on this platform are invalid for use after a failed connection attempt.
   at System.Net.Sockets.Socket.ThrowMultiConnectNotSupported()

so it needs to be modified a bit to make it work.

The following code compiles and executes correctly on Linux systems and instantiates a SocketsHttpHandler object with TCP keep-alive enabled which will send probes every 15 seconds after the connection has been idle for 5 minutes, to align its behavior to the AWS SDK Lambda client:

var socketHttpHandler = new SocketsHttpHandler();

socketHttpHandler.ConnectCallback = async (ctx, ct) =>
    {
        var IPs = Dns.GetHostAddresses(ctx.DnsEndPoint.Host);
        var s = new Socket(SocketType.Stream, ProtocolType.Tcp) { NoDelay = true };
        try
        {
            s.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.KeepAlive, true);
            s.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.TcpKeepAliveTime, 300);
            s.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.TcpKeepAliveInterval, 15);
            s.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.TcpKeepAliveRetryCount, 5);
            await s.ConnectAsync(IPs,ctx.DnsEndPoint.Port, ct);
            return new NetworkStream(s, ownsSocket: true);
        }
        catch
        {
            s.Dispose();
            throw;
        }
    };

using var client = new HttpClient(socketHttpHandler)
{
    Timeout = TimeSpan.FromMinutes(15)
};   
profile pictureAWS
EXPERT
published a year ago2770 views
2 Comments

This will be very useful to some AWS users building with .NET, great explanation and clean code sample!

profile pictureAWS
Kirk_D
replied a year ago

TCP keep-alive is not available in AmazonLambdaConfig after updating the AWSSDK.Lambda and AWSSDK.Core packets to latest versions

ssg
replied a year ago