I have a python lambda streaming data to timestream that has been running for the last few months.
The lambda has always run well under a second, with an average of 15-30ms - regardless if the data itself is current (memory store) or historical (magnetic store).
This week the lambda is timing out, even when the timeout duration is set to a max of 15 mins.
The place its getting stuck is where we make the call to timestream to write the actual records.
It just sits here, for a duration of between 30ms to 15mins.
The data that gets stuck is always processed on a retry and appears in the DB, so its not a data issue.
This is causing our SQS queues to build up, as our lambda failures and timeouts cause throttling, and a reduced concurrency.
The line in question :
result = write_client.write_records(DatabaseName=DATABASE_NAME,
TableName=table_name,
CommonAttributes=common_attributes,
Records=records)