Hello everyone.
We are facing a huge problem with using Lambda .net functions.
They are working perfectly for 58 minutes each hour: from approzimately 03 min to 59
But we are having a constant problems at the first 2-3 minutes of each hour, when something happens and the function loses network and can't connect to the database (MySQL RDS).
It is getting "connection timeout", and the problem may appear at any moment during function execution.
For example, function has to proceed 10 SQL queries, it is like
- Connect to DB,
- Proceed Query 1
- Proceed Query 2
- ...
- Proceed Query 10
- Close connection
And in the first hour minute the connection may receive timeout at any moment, it looks like all network is terminated by some external evil power.
In CoudWatch we receive message like that:
2023-04-04T05:01:12.662Z fail MySql.Data.MySqlClient.MySqlException (0x80004005): Fatal error encountered during command execution. ---> MySql.Data.MySqlClient.MySqlException (0x80004005): Fatal error encountered attempting to read the resultset. ---> MySql.Data.MySqlClient.MySqlException (0x80004005): Reading from the stream has failed. ---> System.IO.IOException: Unable to read data from the transport connection: Connection timed out. ---> System.Net.Sockets.SocketException (110): Connection timed out at System.Net.Sockets.NetworkStream.Read(Span1 buffer) --- End of inner exception stack trace --- at System.Net.Sockets.NetworkStream.Read(Span
1 buffer) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Security.SslStream.Read(Byte[] buffer, Int32 offset, Int32 count) at MySql.Data.MySqlClient.TimedStream.Read(Byte[] buffer, Int32 offset, Int32 count) at MySql.Data.MySqlClient.MySqlStream.ReadFully(Stream stream, Byte[] buffer, Int32 offset, Int32 count) at MySql.Data.MySqlClient.MySqlStream.LoadPacket() at MySql.Data.MySqlClient.MySqlStream.LoadPacket() at MySql.Data.MySqlClient.MySqlStream.ReadPacket() at MySql.Data.MySqlClient.NativeDriver.GetResult(Int32& affectedRow, Int64& insertedId) at MySql.Data.MySqlClient.Driver.NextResult(Int32 statementId, Boolean force) at MySql.Data.MySqlClient.MySqlDataReader.NextResult() at MySql.Data.MySqlClient.MySqlDataReader.NextResult() at MySql.Data.MySqlClient.MySqlCommand.ExecuteReader(CommandBehavior behavior) at MySql.Data.MySqlClient.MySqlCommand.ExecuteReader(CommandBehavior behavior) at <...........> at <...........>.Function.ProcessMessageAsync(SQSMessage message, ILambdaContext context) at <...........>.Function.FunctionHandler(SQSEvent evnt, ILambdaContext context) at lambda_method1(Closure , Stream , ILambdaContext , Stream ) at Amazon.Lambda.RuntimeSupport.Bootstrap.UserCodeLoader.Invoke(Stream lambdaData, ILambdaContext lambdaContext, Stream outStream) in /src/Repo/Libraries/src/Amazon.Lambda.RuntimeSupport/Bootstrap/UserCodeLoader.cs:line 145 at Amazon.Lambda.RuntimeSupport.HandlerWrapper.<>c__DisplayClass8_0.<GetHandlerWrapper>b__0(InvocationRequest invocation) in /src/Repo/Libraries/src/Amazon.Lambda.RuntimeSupport/Bootstrap/HandlerWrapper.cs:line 55 at Amazon.Lambda.RuntimeSupport.LambdaBootstrap.InvokeOnceAsync(CancellationToken cancellationToken) in /src/Repo/Libraries/src/Amazon.Lambda.RuntimeSupport/Bootstrap/LambdaBootstrap.cs:line 208
And we NEVER get this error during 03-59 minutes each hour
We tried to make connection retry until it starts working, and sometimes it takes up to 2-3 minutes to become alive - it is VERY long, and Lambda may receive it's own timeout while retrying
We make 10 retry attempts, using try/catch - if the query is not successful, we are logging the exception, trying to reconnect to db and make the query again
and in logs it looks like:
2023-04-19T10:00:37.849+03:00 2023-04-19T07:00:37.849Z 19cb7bc5-222d-595b-9df4-277cb18eac8e info MysqlException Query try (0):Fatal error encountered during command execution.
2023-04-19T10:01:08.879+03:00 2023-04-19T07:01:08.879Z 19cb7bc5-222d-595b-9df4-277cb18eac8e info MysqlException Query try (1):Fatal error encountered during command execution.
2023-04-19T10:01:39.990+03:00 2023-04-19T07:01:39.990Z 19cb7bc5-222d-595b-9df4-277cb18eac8e info MysqlException Query try (2):Fatal error encountered during command execution.
2023-04-19T10:02:11.032+03:00 2023-04-19T07:02:11.032Z 19cb7bc5-222d-595b-9df4-277cb18eac8e info MysqlException Query try (3):Fatal error encountered during command execution.
2023-04-19T10:02:42.121+03:00 2023-04-19T07:02:42.121Z 19cb7bc5-222d-595b-9df4-277cb18eac8e info MysqlException Query try (4):Fatal error encountered during command execution.
- and after the last line the connection has been successfully restarted and the query has been done successful.
The DB is ok till this situation, other connections (from other services) are ok, only Lambda faces this problem.
The fatality of this problem is that it is:
- unpredictable: Lambda can establish connection and it will be terminated unexpectable during execution - so part of Lambda work may be done and then it will get the error. And it happens not in all Lambda functions at one moment, and not guaranteed for one Lambda function every hour, but unexpectable happens in different Lambda functions but only in the first minutes of the hour.
- critical: we can't trust the data in our system, because it may be lost each hour
- annoying: we make a lot of handworking with data recovering after these errors
- bad for users: Lambda functions receive Timeout during retrying to connect - especially when they are used through API Gateway and maximum available duration is 30 s
Could anyone tell, what is that evil power that terminates the Lambda's connection, and is there a way to manage it?
Could you give more specs on how lambda is being invoked? Also how are you making lambda execute for an hour if its time limit is 15 minutes?
I would check cloud trail for what activity in AWS is occurring on the hour. I suspect something is running outside of your lambda that is impacting your network or database.