How do I troubleshoot SnapMirror issues on my Amazon FSx for NetApp ONTAP file system?

3 minute read
0

I want to troubleshoot common SnapMirror issues with Amazon FSx for NetApp ONTAP.

Resolution

SnapMirror shows a failed to query transfer status

SnapMirror returns this error:

"Failed to query transfer status. (Destination is in an invalid transfer state (Replication engine error))"

This error occurs because the SnapMirror checkpoint isn't in sync with the file system. To resolve this error, follow these steps:

  1. Delete the SnapMirror relationship with the destination volume:

    FsxIdxxxxxxx::> snapmirror delete -destination-path destsvm:destvol 
    FsxIdxxxxxxx::> volume offline -vserver destsvm -volume destvol
    FsxIdxxxxxxx::> volume delete -vserver destsvm -volume destvol
  2. Recreate the SnapMirror, and then initialize it:

    FsxIdxxxxxxx::> snapmirror create -source-path srcsvm:srcvol -destination-path destsvm:destvol
    FsxIdxxxxxxx::> snapmirror initialize -destination-path destsvm:destvol
  3. (Optional) If the error persists, then cancel the destination path and then update the path:

    FsxIdxxxxxxx::> snapmirror abort -destination-path destsvm:destvol -hard true
    FsxIdxxxxxxx::> snapmirror update -destination-path destsvm:destvol

For more information, see SnapMirror transfer fails with error "Destination is in an invalid transfer state" on the NetApp website.

SnapMirror update or resync failure

The SnapMirror update or resync fails with this error:

"No common Snapshot copy found"

This error occurs when there are no common snapshots between the source and destination. Forceful deletion or auto deletion of common snapshots might cause this error.

To resolve this error, follow these steps:

  1. Delete and release the older SnapMirror relationship.
  2. Delete the associated destination volume.
  3. Create and initialize a new SnapMirror relationship in to a new destination volume.

For more information, see Update or resync of a SnapMirror relationship fails with No common Snapshot error on the NetApp website.

SnapMirror transfer throughput has slow replication times

To troubleshoot slow replication times with SnapMirror transfer throughput, complete these steps:

  1. Check if global throttling is turned on. For more information, see Why does SnapMirror replication take a long time on my FSx for Netapp ONTAP volume?
  2. Use a ping command to check the maximum transmission unit (MTU) for a mismatch in the network path.
    Example successful ping for jumbo frame 9001:
    FsxIdxxxxxxx::> network ping -vserver AD -lif nfs_smb_management_1 -destination 10.0.16.178 -disallow-fragmentation true -packet-size 8972 -show-detail true -verbose -count 10
    PING 10.0.16.178 (10.0.16.178) from 10.0.12.253: 8972 data bytes
     to 10.0.12.253 8980 bytes from 10.0.16.178: icmp_seq=0 ttl=255 time=1.129 ms
     to 10.0.12.253 8980 bytes from 10.0.16.178: icmp_seq=1 ttl=255 time=1.061 ms
     to 10.0.12.253 8980 bytes from 10.0.16.178: icmp_seq=2 ttl=255 time=1.120 ms
     to 10.0.12.253 8980 bytes from 10.0.16.178: icmp_seq=3 ttl=255 time=1.076 ms
    For more information, see How to verify optimal MTU packet size for cluster peering and SnapMirror on the NetApp website.
  3. Check Amazon CloudWatch metrics to see if disk IOPS or network utilization is reaching 100%. Increase the throughput capacity or IOPS. You can also run the qos command to view the current IOPS, throughput, and latency for various workloads on the file system:
FsxIdxxxxxxx::> qos statistics workload performance show

Workload            ID     IOPS       Throughput    Latency
--------------- ------ -------- ---------------- ----------
-total-              -      446        31.44KB/s        0ms
_USERSPACE_APPS     14      445        31.44KB/s        0ms
_WAFL_SCAN          20        1            0KB/s        0ms
-total-              -      435        36.54KB/s    23.00us
_USERSPACE_APPS     14      435        36.54KB/s    23.00us
-total-              -     4505        42.31KB/s        0ms

Related information

Delete a volume replication relationship on the NetApp website

AWS OFFICIAL
AWS OFFICIALUpdated 6 months ago
1 Comment

how can i check when was the last snapmirror done. From monitoring purposes to set up any alerts on delays / failures in snapmirror ?

replied a month ago