SSM agent won't get new tokens after network failure resolved

1

I have multiple machines running hybrid SSM Agent. Those machines in one network suffered a multi-day network outage. When the network issue was restored SSM Agent wouldn't 'reconnect'. I cannot start sessions to access these machines. Here are the relevant log lines from /var/log/amazon/ssm/amazon-ssm-agent.log:

2021-12-23 13:42:22 INFO [ssm-agent-worker] [MessagingDeliveryService] increasing error count by 1
2021-12-23 13:42:24 ERROR [ssm-agent-worker] [MessagingDeliveryService] error when calling AWS APIs. error details - GetMessages Error: shared credentials are already expired, they were queried at 2021-12-21T11:30:10-06:00 and expired at 2021-12-21T18:30:10Z
2021-12-23 13:42:24 INFO [ssm-agent-worker] [MessagingDeliveryService] increasing error count by 1
2021-12-23 13:42:26 ERROR [ssm-agent-worker] [MessagingDeliveryService] error when calling AWS APIs. error details - GetMessages Error: shared credentials are already expired, they were queried at 2021-12-21T11:30:10-06:00 and expired at 2021-12-21T18:30:10Z
2021-12-23 13:42:26 INFO [ssm-agent-worker] [MessagingDeliveryService] increasing error count by 1
2021-12-23 13:42:29 ERROR [ssm-agent-worker] [MessagingDeliveryService] error when calling AWS APIs. error details - GetMessages Error: shared credentials are already expired, they were queried at 2021-12-21T11:30:10-06:00 and expired at 2021-12-21T18:30:10Z
2021-12-23 13:42:29 INFO [ssm-agent-worker] [MessagingDeliveryService] increasing error count by 1
2021-12-23 13:42:31 ERROR [ssm-agent-worker] [MessagingDeliveryService] error when calling AWS APIs. error details - GetMessages Error: shared credentials are already expired, they were queried at 2021-12-21T11:30:10-06:00 and expired at 2021-12-21T18:30:10Z
2021-12-23 13:42:31 INFO [ssm-agent-worker] [MessagingDeliveryService] increasing error count by 1
2021-12-23 13:42:33 ERROR [ssm-agent-worker] [MessagingDeliveryService] MessagingDeliveryService stopped temporarily due to internal failure. We will retry automatically after 15 minutes

That seems to repeat round and around. The credentials are now a couple of days old as can be seen by the timestamps. I am assuming the "internal failure" is trying to refresh the tokens.

I restarted the agent on one machine (through systemctl restart) and it came back fine. So it's definitely some state in the running agents that is the problem. I have left the others in their failed state in case someone responds with something for me to test this further.

已提問 2 年前檢視次數 156 次
沒有答案

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南