How can I troubleshoot slow performance when I copy local files to Storage Gateway?
4 minute read
I want to copy local files to my Network File System (NFS) or Server Message Block (SMB) file share on AWS Storage Gateway, but the transfer is slow. How can I improve the upload performance?
Consider the following ways to improve the performance when you copy local files to a file share on Storage Gateway:
Note: A file gateway is an object-store cache, not a file server. This means that a file gateway's performance characteristics differ from those of file servers.
Scale your workload
For the best performance, scale your workload by adding threads or clients. When you transfer a directory of files, a file gateway scales best when the workload is multi-threaded or involves multiple clients. Review your file-management tool and confirm whether the tool runs single-threaded uploads by default.
It's a best practice to use multiple threads or clients when you transfer small or large files. You get the highest MiB per second throughput when you transfer large files (tens or hundreds of MiB each) using multiple threads. Because of the overhead of creating new files, transferring many small files results in a lower MiB per second throughput when compared to the same workload with large files.
To perform a multi-threaded copy in Windows, use robocopy, a file copy tool by Microsoft.
Note: For transfers of smaller files, measure the transfer rate in files per second instead of MiB per second. The rate of file creation can take up workload space associated with transferring smaller files.
Also, monitor the CachePercentDirty metric for your gateway. This metric returns the percentage of Cache storage that's occupied by data that isn't persisted to an S3 bucket. A high value of CachePercentDirty can cause the gateway's cache storage to throttle writes to the gateway.
Use higher-performance disks
It's a best practice to use solid state drive (SSD) backed disks for your gateway's cache storage with dedicated tenancy. Ideally, the underlying physical disks shouldn't be shared with other virtual machines in order to prevent IOPS exhaustion.
To measure disk IOPS, use the ReadBytes and WriteBytes metric with the Samples statistic in CloudWatch. As a general rule, when you review these metrics for the gateway, look for low throughput and low IOPS trends to indicate any disk-related bottlenecks.
Monitor the IOWaitPercent metric in CloudWatch, which reports the percentage of time that the CPU is waiting for a response from the local disk.. A value higher than 10% typically indicates a bottleneck in the underlying disks and can be a result of slower disks. In this case, add additional disks to provide more available IOPS to the gateway.
Note: For Amazon Elastic Compute Cloud (Amazon EC2) based gateways, the Amazon Elastic Block Store (Amazon EBS) throughput of the instance can also be a limiting factor. Confirm that the CPU and RAM of your gateway's host virtual machine or Amazon EC2 instance supports your gateway's throughput to AWS. For example, every EC2 instance type has a different baseline throughput. If burst throughput is exhausted, then the instance uses its baseline throughput, which can limit the upload throughput to AWS. If your gateway is hosted on an Amazon EC2 instance, NetworkOut metric for the instance. If the NetworkOut metric stays at the baseline throughput during your testing, then consider changing the instance to a larger instance type.