AWS Guidance: Best Practices for running MSSQL workloads on FSx for NetApp ONTAP

6 minute read
Content level: Foundational

This article provides some best practices which you can use to optimize your experience running Fsx for NetApp ONTAP in AWS for self-managed Microsoft SQL workloads.

For over a decade, customers have successfully opted to run their mission critical Microsoft workloads on AWS. Today, AWS offers customers the flexibility to either rehost, replatform, or refactor their databases using purpose-built solutions to meet the demands of their specific use-case(s). In certain use-cases, customers want to maintain complete control of the database environment (including operating system-level access) and opt to adopt an architecture in AWS leveraging self-managed Microsoft SQL Server (MSSQL) on Amazon Elastic Compute Cloud (Amazon EC2). When doing so, customers can choose from a number of storage solutions including Amazon Elastic Block Store (Amazon EBS), Amazon FSx for Windows File Server, or Amazon FSx for NetApp ONTAP to meet the unique performance and reliability needs of their workload.

Each of these storage solutions offers distinct advantages for SQL Server workloads. EBS is useful for SQL workloads that require automated backups and cross-region replication for data redundancy and disaster recovery. Whereas FSx for Windows benefits SQL workloads needing high availability and fault tolerance through multi-AZ deployment options. Additionally, FSx for ONTAP offers advantages such as support of application consistent snapshots for SQL workloads requiring it, multi-AZ deployment, and iSCSI support for comprehensive data management and high availability.

Self-managed MSSQL deployments supported in AWS include standalone deployments, and HA deployments such as Failover Cluster Instances (FCI), or Always on availability group (AG’s). Though each storage solution in AWS offers customers a unique value proposition, deploying self-managed SQL databases on FSx for ONTAP is a common architecture pattern for many enterprise customers today. FSx for ONTAP is a fully managed service in AWS which provides highly reliable, scalable, high-performing, and feature-rich file storage built on NetApp’s popular ONTAP file system. Notably, FSx for ONTAP offers the feature parity that existing NetApp customers require when considering a workload migration to the cloud. With support of common communication protocols such as SMB, NFS and iSCSI, FSx for ONTAP can meet the diverse performance needs of MSSQL workloads deployed on either Linux or Windows operating systems. Deploying MSSQL on FSx for ONTAP to leverage the iSCSI protocol in particular provides a highly performant storage architecture with configurable optimizations to meet the needs of even the most demanding database workloads.

Optimizing your MSSQL workloads on FSx for ONTAP requires an intimate understanding of your workloads’ characteristics and requirements such as throughput and latency. This will ensure performance bottlenecking and/or throttling does not occur when adopting the solution. Additionally, the following are my recommended best practices you can use when implementing FSx for ONTAP for your self-managed MSSQL workloads in AWS.

1. Optimizing Receive Side Scaling (RSS)- Receive side scaling is a network driver technology that enables the efficient distribution of network processing across multiple CPUs. By default, Windows AMIs in AWS are configured with RSS enabled but require additional configuration to optimize the consumption of CPUs over multi-core systems to enable more network traffic to be processed efficiently. RSS settings are persistent for each ENI attached to your self-managed compute instances, however resizing requires a re-evaluation of the RSS configuration to ensure optimal performance is achieved for your MSSQL deployment.

2. Space Allocation- Setting the space-allocation option on the LUN’s provisioned to ‘enabled’ can ensure you avoid MSSQL failures related to limited storage space availability. The space-allocation option by default is set to ‘disabled’ and enabling this will ensure FSx for ONTAP notifies the EC2 host when the volume has run out of space and can no longer accept writes. Additionally, be sure to setup Cloudwatch thresholds for critical FSx for ONTAP metrics including ‘StorageCacpacityUtilization’ or ‘StorageUsed’ to ensure your operational staff is alerted in near-real time of any allocation issues. Auto-resizing solutions can be implemented leveraging AWS Lambda using this step by step guide.

3. Configure Tempdb on Instance Storage- EC2 Instance storage offers fast disk I/O for the temporary objects housed natively within Tempdb. Traffic does not count towards IOPS provisioned for FSx for ONTAP and the cost associated with the storage is included in the hourly cost of the EC2 host itself. Data does not persist on instance storage when recycled so be sure this aligns with your enterprises requirements prior to deploying your system databases in this manner. Additionally, striping multiple instance storage disks together can achieve higher I/O if required.

4. Review I/O thread affinity- Minimizing I/O and network contention on your self-managed EC2's includes both RSS and thread affinity to be properly configured. Under heavy system load, MSSQL performance can at times be constrained due to the native multi-tasking functionality inherent in Microsoft Windows. Assigning processors to specific process threads can improve MSSQL performance in certain scenarios.

5. Selecting the optimal and updated EC2 Instance- There are over 250 instance types available in AWS today, all of which are designed to meet the unique characteristics of your MSSQL workloads. Given the commonly leveraged core licensing cost structure of MSSQL, it’s imperative that you avoid over-provisioning compute to save on costs. Depending upon your provisioned storage solution and performance needs, you’ll also want to consider which instance size is appropriate to allow you the available throughput required of your databases. Ensure you evaluate the storage optimized instances as they offer NVMe SSD (EBS or instance store volumes) designed to meet the needs of data-intensive SQL Server workloads of up to 3 million IOPs, and 100 GBps of network bandwidth. When deploying the AMI associated with your preferred instance, ensure that critical drivers are updated to avoid operational issues.

6. Leverage SnapCenter for SQL specific BCDR- MSSQL server deployments requiring point-in-time recovery rely on application-consistent backups to provide organizations the ability to quickly restore or mirror databases for UAT or disaster recovery scenarios. Providing performant granular database level restores is just one of the many features of SnapCenter and you can learn more about implementing this for your MSSQL workloads here

7. Storage Load Balancing- MPIO is an acronym for MultiPath Input Output. The technology allows workloads the ability to leverage fault-tolerant transport to a data storage system (in this case FSx for ONTAP). For iSCSI communication in particular, MPIO is supported by native drivers in the Windows and Linux operating systems for your defined VPC routes. When enabled, MPIO provides Windows multiple paths to FSx for ONTAP if a failure were to occur. This allows MSSQL the ability to maintain data access consistency by ensuring optimal uptime between your databases on EC2 and the underlying LUNs provisioned in FSx for ONTAP.

Multi-AZ FSx for OnTap Deployment

Amazon FSx for NetApp ONTAP makes it easy and cost effective to launch, run, and scale feature-rich, high-performance file systems in the cloud. It supports a wide range of workloads including self-managed SQL server deployments in AWS. Be sure to tune into re:Post Live on Monday, June 10th at 11am PST/ 3pm EST on where we’ll discuss these types of use-cases more in depth and answer your questions live on air!

profile pictureAWS
published 6 days ago1170 views