Why is VMware Cloud on AWS reporting my Amazon FSx for NetApp ONTAP datastore is full when it isn’t?

4 minute read
Content level: Advanced
1

Resolve FSx for NetApp ONTAP reporting to VMC that the datastore is full

Short Description

Let’s look at an example to show why this may have happened. Suppose we create a new FSx for NetApp ONTAP filesystem that is 5 TiB in size on SSD storage and 100 TiB of volume capacity. For the storage tiering policy, let’s take the default setting of Auto with a 31-day cool down period. This means that the data that gets written to the SSD filesystem must not be accessed for 31 days consecutively by default before getting moved to the capacity pool.

When the filesystem is mounted to our SDDC, vCenter will show a total capacity equal to the SSD filesystem only, in this example, 5 TiB. Now, let’s suppose we start migrating virtual machines to this new datastore, and suppose we move 10 TiB of data in one single operation. This will result in vCenter reporting that the datastore is now 100% full after moving just 5 TiB.

Let’s look at why this happened. By default, any incoming write to the FSx for NetApp ONTAP volume must go to the SSD filesystem first. Because the storage tiering policy will not move any data down to the capacity tier until it’s been untouched for 31 days, FSx for NetApp will not make additional space on the SSD filesystem for any incoming writes that exceed the SSD filesystem size, regardless of how much free space there is in the capacity tier.

Resolution

There are multiple ways that we can go about resolving this issue.

  1. The first way is to add additional capacity to the SSD filesystem. This will allow for all the recently accessed data to remain on SSD while making space for additional incoming writes and maintaining performance. Note that adding SSD capacity will come with an associated additional cost. This is the best option as it is the most performant and doesn’t require any additional monitoring.
  2. Another option is to change the FSx for NetApp storage tiering policy from Auto to ALL. This tiering policy tags all data in SSD as cold and has a background process running that moves this data into the capacity tier, thereby prioritizing the freeing up of space in the SSD filesystem. Incoming writes are written to SSD first, before getting moved to the capacity tier. An important element to note is that when data gets read in the capacity tier, it will not move back up into the SSD tier, however the metadata will always remain in SSD. Consequently, this may have an impact on performance.
  3. In the last option, flags within ONTAP can be flipped to enable the logical space reporting and enforcement instead of physical space. This allows FSx for NetApp to report on the sum of space between SSD and capacity tiers instead of just the SSD tier. These flags can be set at the volume or storage VM level (see below). Note that because the storage is now reporting on both the SSD and capacity tiers, it is possible to fill the SSD filesystem and not get warned by vCenter that it is filling up. Setting these flags requires additional user monitoring on the storage. It is also important to note that the filesystem will still fill up and report a failure to write once the SSD is full.

The ONTAP commands to change these flags are:

Enable logical space reporting for the volume

  • This setting reports back to the client the volume instead of the file system, or SSD, tier

volume modify -vserver svm_name -volume volume_name -size volume_size -is-space-reporting-logical true

Enable logical space enforcement for the volume

  • This setting disables writes to the volume once a set threshold is reached

volume modify -vserver svm_name -volume volume_name -size volume_size -is-space-enforcement-logical true

Enable logical space reporting and enforcement together for the volume

volume modify -vserver svm_name -volume volume_name -size volume_size -is-space-reporting-logical true -is-space-enforcement-logical true

Related

AWS
EXPERT
AMcCord
published 2 months ago912 views