Lambda functions network errors at EACH HOUR first minutes

0

Hello everyone. We are facing a huge problem with using Lambda .net functions. They are working perfectly for 58 minutes each hour: from approzimately 03 min to 59 But we are having a constant problems at the first 2-3 minutes of each hour, when something happens and the function loses network and can't connect to the database (MySQL RDS). It is getting "connection timeout", and the problem may appear at any moment during function execution. For example, function has to proceed 10 SQL queries, it is like

  • Connect to DB,
  • Proceed Query 1
  • Proceed Query 2
  • ...
  • Proceed Query 10
  • Close connection

And in the first hour minute the connection may receive timeout at any moment, it looks like all network is terminated by some external evil power. In CoudWatch we receive message like that:

2023-04-04T05:01:12.662Z fail MySql.Data.MySqlClient.MySqlException (0x80004005): Fatal error encountered during command execution. ---> MySql.Data.MySqlClient.MySqlException (0x80004005): Fatal error encountered attempting to read the resultset. ---> MySql.Data.MySqlClient.MySqlException (0x80004005): Reading from the stream has failed. ---> System.IO.IOException: Unable to read data from the transport connection: Connection timed out. ---> System.Net.Sockets.SocketException (110): Connection timed out at System.Net.Sockets.NetworkStream.Read(Span1 buffer) --- End of inner exception stack trace --- at System.Net.Sockets.NetworkStream.Read(Span1 buffer) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Security.SslStream.Read(Byte[] buffer, Int32 offset, Int32 count) at MySql.Data.MySqlClient.TimedStream.Read(Byte[] buffer, Int32 offset, Int32 count) at MySql.Data.MySqlClient.MySqlStream.ReadFully(Stream stream, Byte[] buffer, Int32 offset, Int32 count) at MySql.Data.MySqlClient.MySqlStream.LoadPacket() at MySql.Data.MySqlClient.MySqlStream.LoadPacket() at MySql.Data.MySqlClient.MySqlStream.ReadPacket() at MySql.Data.MySqlClient.NativeDriver.GetResult(Int32& affectedRow, Int64& insertedId) at MySql.Data.MySqlClient.Driver.NextResult(Int32 statementId, Boolean force) at MySql.Data.MySqlClient.MySqlDataReader.NextResult() at MySql.Data.MySqlClient.MySqlDataReader.NextResult() at MySql.Data.MySqlClient.MySqlCommand.ExecuteReader(CommandBehavior behavior) at MySql.Data.MySqlClient.MySqlCommand.ExecuteReader(CommandBehavior behavior) at <...........> at <...........>.Function.ProcessMessageAsync(SQSMessage message, ILambdaContext context) at <...........>.Function.FunctionHandler(SQSEvent evnt, ILambdaContext context) at lambda_method1(Closure , Stream , ILambdaContext , Stream ) at Amazon.Lambda.RuntimeSupport.Bootstrap.UserCodeLoader.Invoke(Stream lambdaData, ILambdaContext lambdaContext, Stream outStream) in /src/Repo/Libraries/src/Amazon.Lambda.RuntimeSupport/Bootstrap/UserCodeLoader.cs:line 145 at Amazon.Lambda.RuntimeSupport.HandlerWrapper.<>c__DisplayClass8_0.<GetHandlerWrapper>b__0(InvocationRequest invocation) in /src/Repo/Libraries/src/Amazon.Lambda.RuntimeSupport/Bootstrap/HandlerWrapper.cs:line 55 at Amazon.Lambda.RuntimeSupport.LambdaBootstrap.InvokeOnceAsync(CancellationToken cancellationToken) in /src/Repo/Libraries/src/Amazon.Lambda.RuntimeSupport/Bootstrap/LambdaBootstrap.cs:line 208

And we NEVER get this error during 03-59 minutes each hour

We tried to make connection retry until it starts working, and sometimes it takes up to 2-3 minutes to become alive - it is VERY long, and Lambda may receive it's own timeout while retrying We make 10 retry attempts, using try/catch - if the query is not successful, we are logging the exception, trying to reconnect to db and make the query again and in logs it looks like:

2023-04-19T10:00:37.849+03:00	2023-04-19T07:00:37.849Z 19cb7bc5-222d-595b-9df4-277cb18eac8e info MysqlException Query try (0):Fatal error encountered during command execution.
2023-04-19T10:01:08.879+03:00	2023-04-19T07:01:08.879Z 19cb7bc5-222d-595b-9df4-277cb18eac8e info MysqlException Query try (1):Fatal error encountered during command execution.
2023-04-19T10:01:39.990+03:00	2023-04-19T07:01:39.990Z 19cb7bc5-222d-595b-9df4-277cb18eac8e info MysqlException Query try (2):Fatal error encountered during command execution.
2023-04-19T10:02:11.032+03:00	2023-04-19T07:02:11.032Z 19cb7bc5-222d-595b-9df4-277cb18eac8e info MysqlException Query try (3):Fatal error encountered during command execution.
2023-04-19T10:02:42.121+03:00	2023-04-19T07:02:42.121Z 19cb7bc5-222d-595b-9df4-277cb18eac8e info MysqlException Query try (4):Fatal error encountered during command execution. 
  • and after the last line the connection has been successfully restarted and the query has been done successful.

The DB is ok till this situation, other connections (from other services) are ok, only Lambda faces this problem.

The fatality of this problem is that it is:

  1. unpredictable: Lambda can establish connection and it will be terminated unexpectable during execution - so part of Lambda work may be done and then it will get the error. And it happens not in all Lambda functions at one moment, and not guaranteed for one Lambda function every hour, but unexpectable happens in different Lambda functions but only in the first minutes of the hour.
  2. critical: we can't trust the data in our system, because it may be lost each hour
  3. annoying: we make a lot of handworking with data recovering after these errors
  4. bad for users: Lambda functions receive Timeout during retrying to connect - especially when they are used through API Gateway and maximum available duration is 30 s

Could anyone tell, what is that evil power that terminates the Lambda's connection, and is there a way to manage it?

  • Could you give more specs on how lambda is being invoked? Also how are you making lambda execute for an hour if its time limit is 15 minutes?

  • I would check cloud trail for what activity in AWS is occurring on the hour. I suspect something is running outside of your lambda that is impacting your network or database.

Vita
asked a year ago370 views
1 Answer
0

I'd like to echo the comment above: I think there's something external to the Lambda function that is causing the issue. It's difficult from this distance to say exactly what, but if it were me I'd do the following:

  1. Contact our support team and have them help with the troubleshooting. This is critical because they have visibility into what is happening in your account - here on re:Post we don't so we're guessing at a lot of things.
  2. See if DNS is working during the "on the hour" time period.
  3. As per the comment above - look at CloudTrail to see if there are other events happening at the same time.
  4. Check the RDS logs - is there some failover that is happening right about that time? It's odd that it happens exactly on the hour - could it be scheduled in another system that you have?
  5. Check the IP (TCP) connectivity from a separate Lambda function or EC2 host. You might need to use EC2 and test from all of the subnets that are enabled for Lambda. Maybe there is a routing issue and on the hour the Lambda scheduler decides to launch functions in a different subnet.
profile pictureAWS
EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions