- Newest
- Most votes
- Most comments
1. Implement Lifecycle Hooks:
Lifecycle hooks can pause the termination process of an instance in the ASG, allowing you to check whether any GitHub Actions jobs are running before proceeding with termination. You can configure a lifecycle hook for the Terminating:Wait event, which will put the instance in a wait state when it is scheduled for termination.
2. Monitor GitHub Runner Jobs: You need to implement a mechanism to check if the GitHub runner on the instance is running any jobs.
If the instance is running a job, you can mark it as busy and postpone its termination by controlling the lifecycle hook.
3. Cordon the Node:
Before allowing the instance to terminate, you can "cordon" the node by disabling new jobs from being assigned to it.
GitHub provides an API to remove a runner from the pool temporarily, which will prevent new jobs from being assigned to that runner.
4. Complete Lifecycle Hook:
Once all running jobs on the instance are finished, you can complete the lifecycle action, allowing the instance to terminate gracefully.
Example Workflow:
Create a Lambda function that is triggered by the lifecycle hook when the instance is scheduled for termination.
Check if the instance is running any GitHub jobs by querying the GitHub API.
If jobs are running, postpone termination and cordon the node.
If no jobs are running, complete the lifecycle action and terminate the instance.
Example Code Snippet:
Here’s an example of how you might configure the Lambda function:
import boto3
import requests
# Your GitHub API token and runner information
GITHUB_API_TOKEN = "your_github_token"
RUNNER_ID = "your_runner_id"
def lambda_handler(event, context):
# Check GitHub runner status
headers = {"Authorization": f"token {GITHUB_API_TOKEN}"}
response = requests.get(f"https://api.github.com/repos/your_repo/actions/runners/{RUNNER_ID}", headers=headers)
runner_data = response.json()
if runner_data['busy']:
# Runner is busy, postpone termination
print("Runner is busy, postponing termination")
return
else:
# Runner is idle, allow termination
print("Runner is idle, proceeding with termination")
asg_client = boto3.client('autoscaling')
asg_client.complete_lifecycle_action(
LifecycleHookName=event['LifecycleHookName'],
AutoScalingGroupName=event['AutoScalingGroupName'],
LifecycleActionToken=event['LifecycleActionToken'],
LifecycleActionResult='CONTINUE'
)
Additional Considerations:
Schedule Scaling: You can still use scheduled scaling to define the time frame for scaling up and down. The above process ensures that scaling down is done gracefully.
Error Handling: Implement error handling and retries for API calls and lifecycle hook completions.
https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-scaling-cooldowns.html
Hello,
check these steps once may be helpful
Use Lifecycle Hooks with Scheduled Scaling
- Set Up a Lifecycle Hook:
- Create a lifecycle hook for the "Terminating" state in your ASG. This will pause instance termination, allowing you to check if any GitHub Actions jobs are running.
2.Create a Lambda Function:
- Trigger this function via the lifecycle hook.
- The function should check if any jobs are running. If so, keep the instance in a "Wait" state.
- Cordon the instance so no new jobs start.
- Once jobs are completed, signal the lifecycle hook to proceed with termination.
- Combine with Scheduled Scaling:
- Use scheduled scaling to adjust your ASG size based on your usage time frames, ensuring cost savings without disrupting jobs.
https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks-overview.html
Relevant content
- asked 2 years ago
- asked 4 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago

@Thanniru Thanks for your help, I will try your suggestion. But step 3 sounds not good to me. I want to keep runner, the runner just runs on remain nodes. I think it's possible to cordon instance directly, maybe using kubectl cordon.
You can use IMDS to check when the Lifecycle Hook has started to triggered the draining/cordon: https://docs.aws.amazon.com/autoscaling/ec2/userguide/retrieving-target-lifecycle-state-through-imds.html