Step functions task token expires after 6.5h

0

I have a step function where I wait for an EC2 instance to run up for 8h and send heartbeats periodically. What I do is I send the task token via SQS, a PowerShell script on the EC2 picks it up and goes "aws stepfunctions send-task-heartbeat --task-token XYZ" in a loop. It works fine up until 6h:35m when the operation fails with "...SendTaskHeartbeat operation: The security token included in the request is expired".

First I thought the task token was erroneously given AWS's max default 6h TTL (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-metadata-v2-how-it-works.html) even though the task itself is configured to time out after 8h (I tried 10h as well with the same result), but it's something else as it fails way past 6h mark

State Machine

  • What are you trying to achieve and why? There may be a better way?

  • @Gary Mclean It's a custom remote desktop thing - people connect to the EC2 instance, do some work, and then are supposed to disconnect and shut down the session via a web UI. What happens though is that some people just shut down the EC2 via host OS UI while connected remotely, or EC2's start up script fails or crashes after some time - so heartbeats are used to make sure everything is up and running, and if not, shut everything down and deallocate resources

Vitaly
asked 9 months ago634 views
2 Answers
1
Accepted Answer

Hi, yes, you're right the maximum value of 21'600 seconds (6h)

See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-metadata-v2-how-it-works.html

The PUT request must include a header that specifies the time to live (TTL) for the token, 
in seconds, up to a maximum of six hours (21,600 seconds). The token represents a logical 
session. The TTL specifies the length of time that the token is valid and, therefore, the duration 
of the session.

So, it means just that you have to renew you token (detailled in page above) to go beyond 6h

Best.

Didier

profile pictureAWS
EXPERT
answered 9 months ago
  • This is a different thing, the link was just an example. What gets expired is the task token of crd-wait-for-shutdown, and I don't see any way to get a new task token within Step Functions framework (and pass it to the EC2 instance to continue sending heartbeats)

  • My assumption that it was the task token was wrong. I don't send any security tokens myself, but AWS CLI adds them to all the requests under the hood and I'm supposed to keep AWS CLI "alive". I ended up polling http://169.254.169.254/latest/meta-data/iam/security-credentials/<iam-role-name> every minute, extract the token from it and put it into AWS_SESSION_TOKEN env variable. Interestingly, tokens get updated every hour even though they all have 6.5h life span - i.e. there's a lot of overlap

0

Have you thought about removing the shutdown option from the OS for the users that are using Remote Desktop using a GPO?

On the start up side, perhaps have a monitor like eventbridge when an ec2 starts to monitor for the start up script via health check. Once started ok then stop checking the ec2?

Similarly you could monitor for ec2 instance event changes and power the ec2 backup automatically if it’s been powered down.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-instance-state-changes.html

profile picture
EXPERT
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions