AWS Automation for SQL Always On Groups: Scale Up for Performance, Scale Down for Savings
Learn how AWS Systems Manager Automation helps dynamically scale SQL Server instances—boosting performance when needed and cutting costs when not.
Introduction
Balancing SQL Server performance and cost efficiency is a challenge for many businesses. Over-provisioning leads to unnecessary expenses, while under-provisioning can impact critical workloads. With AWS Systems Manager Automation, organizations can dynamically scale SQL Server instances—scaling up during peak demand for optimal performance, and scaling down during off-peak times to reduce costs. This automation improves operational efficiency, eliminates manual intervention, and ensures businesses only pay for the resources they need. This blog demonstrates how to leverage AWS Systems Manager (SSM) runbooks to automate the scaling and failover process for SQL Server Always On Availability Groups.
Solution Overview
This solution leverages AWS Systems Manager Automation and AWS Systems Manager Documents (SSM Documents) to dynamically scale SQL Server instances that are in a Microsoft SQL Server Always On Group. Using an SSM Automation runbook, the process stops the EC2 instance, modifies the instance type, restarts it, and promotes the newly resized instance as the primary replica in a Microsoft SQL Server Always On Availability Group. This allows SQL Server to scale up during peak load for performance and down during off-peak hours for cost savings. The entire workflow is fully automated, eliminating manual intervention.
The great thing about AWS Systems Manager Automation Documents (SSM Documents) is that you can customize my code by adding additional steps where needed.
Some useful enhancements you can add include:
- Human-in-the-loop reviews for approval before scaling or failover
- Modifications to SQL Server configurations, such as max memory or max degree of parallelism (MAXDOP)
- Rollback mechanisms to revert changes if an issue occurs
This solution requires the creation of two SSM Documents:
Document 1 – Scales Secondary EC2 instance and Switches Over SQL Server Always On AG
- Stops the secondary instance
- Modifies the instance type
- Starts the instance
- Promotes it to primary in the AG
Document 2 – Switches SQL Server Always On AG Back and Resizes the Secondary EC2 Instance
- Fails back the SQL Always On AG to the original primary
- Stops the current primary
- Scales it down to a new instance type
- Restarts the instance
Prerequisites
- AWS Systems Manager Agent must be installed and running on all SQL Server instances
- AWS Systems Manager Automation must be enabled for the AWS account
- Two or more EC2 instances configured in a Microsoft SQL Always On Availability Group (AG)
Walkthrough
The first step is to create a role that we will attach a policy to that allows for the following.
EC2 Permissions
- Stop and start EC2 instances (ec2:StopInstances, ec2:StartInstances)
- Modify instance attributes (e.g., changing instance type) (ec2:ModifyInstanceAttribute)
- Retrieve instance details:
- Get instance information (ec2:DescribeInstances)
- Check instance status (ec2:DescribeInstanceStatus)
- Get instance attributes (ec2:DescribeInstanceAttribute)
AWS Systems Manager (SSM) Permissions
- Execute SSM commands on instances (ssm:SendCommand)
- Start and manage SSM Automation runbooks (ssm:StartAutomationExecution)
- Monitor command execution status:
- Get command invocation details (ssm:GetCommandInvocation)
- List commands issued (ssm:ListCommands)
- View past command executions (ssm:ListCommandInvocations)
Create IAM Role
- Open the AWS IAM Console
- Select Roles, Create role
- Under Use case, select Systems Manager, select Next
- Don't select a policy, select next
- Give the role a name, I have used "SSM-Automation-Role"
- Select Create role
Create IAM Policy
- Open the AWS IAM Console
- Select Policies, Create policy
- Select Policy editor "JSON"
- Input the below JSON code. Note: you will need to input your AWS Account ID and if you used a different IAM role name than "SSM-Automation-Role", please replace this with your IAM role name.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:StopInstances",
"ec2:StartInstances",
"ec2:ModifyInstanceAttribute",
"ec2:DescribeInstances",
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstanceAttribute"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ssm:SendCommand",
"ssm:StartAutomationExecution",
"ssm:GetCommandInvocation",
"ssm:DescribeInstanceInformation",
"ssm:ListCommands",
"ssm:ListCommandInvocations"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::ACCOUNTID:role/SSM-Automation-Role"
}
]
}
- Select Next
- Give the Policy a name, I have used "SSM-Full-Automation-Policy"
- Select Create Policy
Create an Systems Manager Automation Document 1
We will now create the first automation document:
Document 1 – Scales Secondary EC2 instance and Switches Over SQL Server Always On AG
- Open the AWS Systems Manager Console
- Under Change Management Tools, select Documents
- Select Create Document, make sure to select Automation, not Command or Session
- The title of the document is in the top left hand corner, the default name is NewRunbook. Click the edit icon next to this name, to change it to something that is in line with your naming policies. I have called my first document "ScaleAndFailoverSQLAG"
- Select {} Code. This will change the view from Design mode to the code editor
- Copy the below code into the editor, please ensure this code meets your companies requirements / guidelines.
{
"description": "Stop, modify, restart secondary instance and failover SQL Server AG automatically",
"schemaVersion": "0.3",
"parameters": {
"AutomationAssumeRole": {
"type": "String",
"description": "IAM role that allows Systems Manager to perform actions"
},
"SecondaryInstanceId": {
"type": "String",
"description": "EC2 Instance ID of the SQL Server secondary node"
},
"AvailabilityGroupName": {
"type": "String",
"description": "The SQL Server Always On Availability Group name"
},
"NewInstanceType": {
"type": "String",
"description": "The new instance type to set for the secondary instance"
}
},
"assumeRole": "{{AutomationAssumeRole}}",
"mainSteps": [
{
"name": "StopInstance",
"action": "aws:changeInstanceState",
"nextStep": "WaitForInstanceStopped",
"isEnd": false,
"inputs": {
"InstanceIds": [
"{{SecondaryInstanceId}}"
],
"DesiredState": "stopped"
}
},
{
"name": "WaitForInstanceStopped",
"action": "aws:waitForAwsResourceProperty",
"nextStep": "ModifyInstanceType",
"isEnd": false,
"inputs": {
"Service": "ec2",
"Api": "DescribeInstances",
"InstanceIds": [
"{{SecondaryInstanceId}}"
],
"PropertySelector": "$.Reservations[0].Instances[0].State.Name",
"DesiredValues": [
"stopped"
]
}
},
{
"name": "ModifyInstanceType",
"action": "aws:executeAwsApi",
"nextStep": "StartInstance",
"isEnd": false,
"inputs": {
"Service": "ec2",
"Api": "ModifyInstanceAttribute",
"InstanceId": "{{SecondaryInstanceId}}",
"InstanceType": {
"Value": "{{NewInstanceType}}"
}
}
},
{
"name": "StartInstance",
"action": "aws:changeInstanceState",
"nextStep": "WaitForInstanceRunning",
"isEnd": false,
"inputs": {
"InstanceIds": [
"{{SecondaryInstanceId}}"
],
"DesiredState": "running"
}
},
{
"name": "WaitForInstanceRunning",
"action": "aws:waitForAwsResourceProperty",
"nextStep": "FailoverAvailabilityGroup",
"isEnd": false,
"inputs": {
"Service": "ec2",
"Api": "DescribeInstances",
"InstanceIds": [
"{{SecondaryInstanceId}}"
],
"PropertySelector": "$.Reservations[0].Instances[0].State.Name",
"DesiredValues": [
"running"
]
}
},
{
"name": "FailoverAvailabilityGroup",
"action": "aws:runCommand",
"isEnd": true,
"inputs": {
"DocumentName": "AWS-RunPowerShellScript",
"InstanceIds": [
"{{SecondaryInstanceId}}"
],
"Parameters": {
"commands": [
"Import-Module SQLServer",
"Invoke-Sqlcmd -Query \"ALTER AVAILABILITY GROUP [{{AvailabilityGroupName}}] FAILOVER;\" -ServerInstance 'localhost'"
]
}
}
}
]
}
- On the right-hand panel of the Document editor, you should now see the design view of the document we have created.
- If you have other items to add, this is the place to do it. Feel free to test and experiment with different capabilities as you go through this guide.
- Select Create runbook
Test Document 1 – Scales Secondary EC2 instance and Switches Over SQL Server Always On AG
Now that we have created Document 1, let's test it.. Please make sure you are testing this on either non-production systems and/or you have a change window that you have for testing.
- Open the AWS Systems Manager Console
- Under Change Management Tools, select Documents
- Select the Tab "Owned by me"
- Find the document you just created. Mine is called "ScaleAndFailoverSQLAG" Click on the document name
- Review the document, and make sure you are happy with all the information
- Select Execute automation from the top right-hand corner of the window
- Select the current Secondary instance for your SQL Always on AG group. You should see a list of all your EC2 instances
- Select the IAM Role that we created earlier under AutomationAssumeRole
- Input the SQL Availability Group Name (make sure you have spelt the AG group name correctly)
- Input the new instance type. I will be using a c6a.2xlarge (to replace a c6a.xlarge)
- Check everything again (once again make sure your SQL AG name is correct and also the new instance type is correct), select Execute
SSM Document Execution Review
Once the Document is you can follow the execution status for each step. Note this should take about 4 -5 minutes. You should see the follow once the Document has completed successfully.
Once the document has completed, check that your EC2 instance has changed to the new instance size. More importantly, ensure that your SQL Always On AG has switched over to the Secondary (now Primary) instance.
Create Systems Manager Automation Document 2
We will now create the second automation document:
Document 2 – Switches SQL Server Always On AG Back and Resizes the Secondary EC2 Instance
- Open the AWS Systems Manager Console
- Under Change Management Tools, select Documents
- Select Create Document, make sure to select Automation, not Command or Sessin
- The title of the document is in the top left hand corner, the default name is NewRunbook. Click the edit icon next to this name, to change it to something that is in line with your naming policies. I have called my first document "ScaleBackAndFailoverSQLAG"
- Select {} Code. This will change the view from Design mode to the code editor
- Copy the below code into the editor, please ensure this code meets your companies requirements / guidelines.
{
"description": "Failback SQL Server Always On AG and resize the primary instance",
"schemaVersion": "0.3",
"parameters": {
"AutomationAssumeRole": {
"type": "String",
"description": "IAM role that allows Systems Manager to perform actions"
},
"OriginalPrimaryInstanceId": {
"type": "String",
"description": "EC2 Instance ID of the original primary SQL Server (to receive the AG failback)"
},
"CurrentPrimaryInstanceId": {
"type": "String",
"description": "EC2 Instance ID of the SQL Server currently acting as primary"
},
"AvailabilityGroupName": {
"type": "String",
"description": "The SQL Server Always On Availability Group name"
},
"NewInstanceType": {
"type": "String",
"description": "The new instance type for the current primary node"
}
},
"assumeRole": "{{AutomationAssumeRole}}",
"mainSteps": [
{
"name": "FailbackAvailabilityGroup",
"action": "aws:runCommand",
"nextStep": "StopInstance",
"isEnd": false,
"inputs": {
"DocumentName": "AWS-RunPowerShellScript",
"InstanceIds": [
"{{OriginalPrimaryInstanceId}}"
],
"Parameters": {
"commands": [
"Import-Module SQLServer",
"Invoke-Sqlcmd -Query \"ALTER AVAILABILITY GROUP [{{AvailabilityGroupName}}] FAILOVER;\" -ServerInstance 'localhost'"
]
}
}
},
{
"name": "StopInstance",
"action": "aws:changeInstanceState",
"nextStep": "WaitForInstanceStopped",
"isEnd": false,
"inputs": {
"InstanceIds": [
"{{CurrentPrimaryInstanceId}}"
],
"DesiredState": "stopped"
}
},
{
"name": "WaitForInstanceStopped",
"action": "aws:waitForAwsResourceProperty",
"nextStep": "ModifyInstanceType",
"isEnd": false,
"inputs": {
"Service": "ec2",
"Api": "DescribeInstances",
"InstanceIds": [
"{{CurrentPrimaryInstanceId}}"
],
"PropertySelector": "$.Reservations[0].Instances[0].State.Name",
"DesiredValues": [
"stopped"
]
}
},
{
"name": "ModifyInstanceType",
"action": "aws:executeAwsApi",
"nextStep": "StartInstance",
"isEnd": false,
"inputs": {
"Service": "ec2",
"Api": "ModifyInstanceAttribute",
"InstanceId": "{{CurrentPrimaryInstanceId}}",
"InstanceType": {
"Value": "{{NewInstanceType}}"
}
}
},
{
"name": "StartInstance",
"action": "aws:changeInstanceState",
"nextStep": "WaitForInstanceRunning",
"isEnd": false,
"inputs": {
"InstanceIds": [
"{{CurrentPrimaryInstanceId}}"
],
"DesiredState": "running"
}
},
{
"name": "WaitForInstanceRunning",
"action": "aws:waitForAwsResourceProperty",
"isEnd": true,
"inputs": {
"Service": "ec2",
"Api": "DescribeInstances",
"InstanceIds": [
"{{CurrentPrimaryInstanceId}}"
],
"PropertySelector": "$.Reservations[0].Instances[0].State.Name",
"DesiredValues": [
"running"
]
}
}
]
}
- On the right-hand panel of the Document editor, you should now see the design view of the document we have created.
- If you have other items to add, this is the place to do it. Feel free to test and experiment with different capabilities as you go through this guide.
- Select Create runbook
Test Document 2 – Switches SQL Server Always On AG Back and Resizes the Secondary EC2 Instance
Now that we have created Document 2, let's test it.. Once again, please make sure you are testing this on either non-production systems and/or you have a change window that you have for testing.
- Open the AWS Systems Manager Console
- Under Change Management Tools, select Documents
- Select the Tab "Owned by me"
- Find the document you just created. Mine is called "ScaleBackAndFailoverSQLAG" Click on the Document name
- Review the document and ensure you are happy with all the information
- Select Execute automation from the top right-hand corner of the window
- Select the Original Primary instance (which is now your Secondary, if you ran the previous Document) for your SQL Always on AG group. You should see a list of all your EC2 instances
- Select the Current Primary instance (which originally was your Secondary,if you ran the previous Document) for your SQL Always on AG group. You should see a list of all your EC2 instances
- Select the IAM Role that we created earlier under AutomationAssumeRole
- Input the SQL Availability Group Name (make sure you have spelt the AG group name correctly)
- Input the new instance type. I will be using a c6a.xlarge (to replace a c6a.2xlarge)
- Check everything again (once again make sure your SQL AG name is correct and also the new instance type is correct), select Execute
SSM Document Execution Review
Once the Document is executed you can follow the execution status for each step. Note this should take about 4 -5 minutes. You should see the follow once the Document has completed successfully.
Once the document has completed, check that your EC2 instance has changed to the new instance size. More importantly, ensure that your SQL Always On AG has switched over to the original Primary instance.
Conclusion & Next Steps
In this post, we have discussed how to leverage AWS Systems Manager Automation to dynamically scale SQL Server instances in a Microsoft SQL Server Always On Availability Group. We explored the creation of SSM runbooks to automate instance resizing and failover, eliminating manual intervention and optimising both performance and cost. Additionally, we highlighted ways to enhance the solution, such as incorporating approval workflows and configuration adjustments.
Next Steps:
- Test the automation in a non-production environment.
- Customise the SSM Documents to align with your organisation’s requirements.
- Implement monitoring and logging to track automation executions.
- Explore additional automation enhancements, such as approval workflows or configuration adjustments.
With these steps, you can further streamline your SQL Server operations while maintaining efficiency and reliability.
- Language
- English
Relevant content
- Accepted Answerasked 2 years ago
