Skip to content

What is the best practice to implement NetApp SnapCenter high availability on AWS?

1

Hi

According to the official NetApp SnapCenter guide, there are two ways to implement high availbility for Snap Center: either through F5 loadbalancer or through MS NLB. Is there any official AWS alternative to achive an HA setup (AWS NLB)? Do you plan to include AWS Managed SQL support for the underlaying database?

Regards Skubi

asked 2 years ago917 views
1 Answer
3

Regarding NLB - Note: While the following URL mentions "F5," the technical steps for 'Add-SmServerCluster' apply to ANY external Load Balancer, including the AWS NLB. see -> https://docs.netapp.com/us-en/snapcenter/install/concept_configure_snapcenter_servers_for_high_availabiity_using_f5.html

The AWS Network Load Balancer (NLB) is the recommended replacement for traditional hardware or software load balancers. It provides the low-latency, Layer 4 connectivity required by SnapCenter.

  • Configuration: You configure the NLB with a Target Group containing your two SnapCenter EC2 instances (Active and Passive in two different AZs within the same VPC).
  • Health Checks:The NLB monitors the SnapCenter service port (default 8146). If the primary node fails, the NLB automatically redirects traffic to the secondary node.
  • VIP/DNS: In the SnapCenter Cluster setup, you use the DNS name or the Private IP of the NLB as the "Cluster IP/FQDN."
  • Cross-Zone Load Balancing: Ensure Cross-Zone Load Balancing is enabled on the NLB to allow traffic to reach the active node regardless of which AZ the client or NLB endpoint is using.

Since SnapCenter operates in an Active/Passive configuration, the services may technically be running on both nodes simultaneously, but only the primary node should handle active traffic.

  • Mechanism: To ensure the AWS NLB correctly identifies the "Active" node, it is recommended to configure a specific Health Check (e.g., an HTTPS check on a specific API endpoint or status page).
  • Failover Behavior: You must ensure that the passive node does not respond to requests on the service port (8146) while in standby mode. This prevents the NLB from "round-robin" routing to an inactive node, which would cause connection errors or data inconsistencies.

DB/RDS: Official support for Standard AWS RDS as a SnapCenter Repository is limited because SnapCenter HA requires deep administrative access to the SQL instance to manage synchronization and failover logic - access that managed RDS typically restricts.

Recommended Approach: For a robust HA setup, it is best practice to run SQL Server on EC2. You can either:

  • Use the built-in MySQL replication provided by SnapCenter.
  • Set up a SQL Server Failover Cluster Instance (FCI) using FSx ONTAP Shared Storage - Multi-AZ (iSCSI) to host the SnapCenter database.

Perhaps "Amazon RDS Custom for SQL Server" with Multi-AZ is also an option here, but I haven't used it yet for this setup.

See also:

1st: This is the primary guide for deploying SnapCenter within the AWS ecosystem. -> https://aws.amazon.com/blogs/storage/using-netapp-snapcenter-with-amazon-fsx-for-netapp-ontap-to-protect-your-sql-server-workloads/

2nd: Detailed walkthrough of the integration between SnapCenter and FSx for ONTAP. -> https://docs.netapp.com/us-en/netapp-solutions-databases/mssql/sqlsb-backup-restore-aws-fsxn.html

Does that answer your question?

EXPERT
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.