SSM agent won't get new tokens after network failure resolved

1

I have multiple machines running hybrid SSM Agent. Those machines in one network suffered a multi-day network outage. When the network issue was restored SSM Agent wouldn't 'reconnect'. I cannot start sessions to access these machines. Here are the relevant log lines from /var/log/amazon/ssm/amazon-ssm-agent.log:

2021-12-23 13:42:22 INFO [ssm-agent-worker] [MessagingDeliveryService] increasing error count by 1
2021-12-23 13:42:24 ERROR [ssm-agent-worker] [MessagingDeliveryService] error when calling AWS APIs. error details - GetMessages Error: shared credentials are already expired, they were queried at 2021-12-21T11:30:10-06:00 and expired at 2021-12-21T18:30:10Z
2021-12-23 13:42:24 INFO [ssm-agent-worker] [MessagingDeliveryService] increasing error count by 1
2021-12-23 13:42:26 ERROR [ssm-agent-worker] [MessagingDeliveryService] error when calling AWS APIs. error details - GetMessages Error: shared credentials are already expired, they were queried at 2021-12-21T11:30:10-06:00 and expired at 2021-12-21T18:30:10Z
2021-12-23 13:42:26 INFO [ssm-agent-worker] [MessagingDeliveryService] increasing error count by 1
2021-12-23 13:42:29 ERROR [ssm-agent-worker] [MessagingDeliveryService] error when calling AWS APIs. error details - GetMessages Error: shared credentials are already expired, they were queried at 2021-12-21T11:30:10-06:00 and expired at 2021-12-21T18:30:10Z
2021-12-23 13:42:29 INFO [ssm-agent-worker] [MessagingDeliveryService] increasing error count by 1
2021-12-23 13:42:31 ERROR [ssm-agent-worker] [MessagingDeliveryService] error when calling AWS APIs. error details - GetMessages Error: shared credentials are already expired, they were queried at 2021-12-21T11:30:10-06:00 and expired at 2021-12-21T18:30:10Z
2021-12-23 13:42:31 INFO [ssm-agent-worker] [MessagingDeliveryService] increasing error count by 1
2021-12-23 13:42:33 ERROR [ssm-agent-worker] [MessagingDeliveryService] MessagingDeliveryService stopped temporarily due to internal failure. We will retry automatically after 15 minutes

That seems to repeat round and around. The credentials are now a couple of days old as can be seen by the timestamps. I am assuming the "internal failure" is trying to refresh the tokens.

I restarted the agent on one machine (through systemctl restart) and it came back fine. So it's definitely some state in the running agents that is the problem. I have left the others in their failed state in case someone responds with something for me to test this further.

asked 2 years ago156 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions