How do I troubleshoot slow performance on my FSx for ONTAP file system?

3 minute read
0

I want to troubleshoot slow performance on my Amazon FSx for NetApp ONTAP file system.

Short description

The following common issues cause slow file system performance:

  • You use the full throughput provisioned for your file system.
  • Your application is I/O intensive and your file system exhausted the allocated I/Os.
  • Your file system primary (SSD) storage tier is almost full.
  • Your file system reached the open files limit.

Resolution

Check I/O and throughput utilization of your file system

  1. Open the Amazon FSx console.
  2. Choose File systems, then select the file system that you want to review metrics for.
  3. On the Summary page, choose Monitoring.
  4. For throughput, check the Total throughput (bytes/sec) metric. For I/O usage, check the Total IOPS (operations/sec) metric.
  5. Expand the metrics section, and then review the values at and around the time when you experienced slow performance. Check the metrics against the Throughput capacity and Provisioned IOPS parameters shown on the file system Summary page.
  6. If the metrics are near the file system's provisioned throughput and IOPS, then performance might be throttled. To resolve this, increase the file system's allocated IOPS and throughput.

Note: Make sure that you provision enough throughput capacity to support your workload's read throughput plus twice your workload's write throughput.

For more information, review the following:

Check if the file system's primary (SSD) storage tier is full

  1. Open the Amazon FSx console.
  2. Choose File systems, and then select the file system that you want to review metrics for.
  3. On the Summary page, choose Monitoring.
  4. To check the free space in your SSD tier, view the Available primary storage capacity metric. To see the available storage distribution as a percentage, view the Storage distribution graph.

It's a best practice to keep your SSD storage tier utilization under 80% on an ongoing basis. Your file system's SSD tier is also used to stage writes to and random reads from the capacity pool tier. Any sudden changes in access patterns cause the utilization of your SSD tier to quickly increase. At 90% SSD utilization, data read from the capacity pool tier isn't cached on the SSD tier. All tiering functionality stops when the SSD tier is at or above 98% utilization. You see performance degradation when this occurs.

For more information, see How to update SSD storage capacity and Provisioned IOPS.

Check if you reached the open files limit

Volumes that are out of inodes (index nodes) or files might receive one of the following error messages when you perform write operations:

  • "Error message no space left on the device"
  • "Error message: file system is out of inodes"
  • "wafl.vol.outOfInodes: file system on Volume vol_name is out of inodes because it has reached the maximum number of files"
  • "INODE: System/Cluster Notification from filer (OUT OF INODES) ALERT"

For more information, see How can I increase the number of inodes or files for the volumes on my Amazon FSx for ONTAP file system?

If the preceding conditions don't match your use case, or if you corrected these conditions and still see slow performance, then contact AWS Support.

AWS OFFICIAL
AWS OFFICIALUpdated 8 months ago