Why is my EFS file system performance slow?

6 minute read
0

My Amazon Elastic File System (Amazon EFS) performance is very slow. I want to identify the cause and troubleshoot the issue.

Short description

The distributed, multi-Availability Zone architecture of Amazon EFS results in a small latency overhead for each file operation. The overall throughput generally increases as the average I/O size increases because the overhead is amortized over a larger amount of data.

Amazon EFS performance relies on multiple factors, including the following:

  • Storage class of EFS.
  • Performance and throughput modes.
  • Type of operations performed on EFS (such as metadata intensive, and so on).
  • Properties of data stored in EFS (such as size and number of files).
  • Mount options.
  • Client side limitations.

Resolution

Storage class of EFS

For more information, see Performance summary.

Performance and throughput modes

Performance modes

Amazon EFS offers two performance modes, General Purpose and Max I/O. Applications can scale their IOPS elastically up to the limit associated with the performance mode.

To determine what performance mode to use, see Performance modes.

Throughput modes

File-based workloads typically drive high levels of throughput for short periods, but drive lower levels of throughput for longer periods. Amazon EFS is designed to burst to high throughput levels for periods of time.

The configured throughput and IOPS affects the performance of Amazon EFS.

It's a best practice to benchmark your workload requirements to help you select the appropriate throughput and performance modes. When you select provisioned throughput, select the values that accommodate your workload requirements. To analyze the throughput and IOPS that's consumed by your file system, see Using metric math with Amazon EFS.

Amazon EFS can scale up to petabytes of storage volume with three modes of throughput: bursting, elastic, and provisioned. If you use bursting throughput, throughput on Amazon EFS scales as your file system grows. If you use provisioned throughput, you can instantly provision the throughput of your file system independent of the amount of data stored. With elastic throughput, you can scale your throughput up or down based on your workload. For more information on throughput modes, see How do Amazon EFS burst credits work?

Types of operations performed on the EC2 instance

Metadata I/O operations

EFS performance suffers in the following situations:

  • When the file sizes are small because it's a distributed system. Distributed architecture results in a small latency overhead for each file operation. Due to this per-operation latency, overall throughput generally increases as the average I/O size increases because the overhead is amortized over more data.
  • Performance on shared file systems suffers if a workload or operation generates many small files serially. This causes the overhead of each operation to increase.
  • Metadata I/O occurs if your application performs metadata-intensive, operations such as, "ls," "rm," "mkdir," "rmdir," "lookup," "getattr," or "setattr", and so on. Any operation that requires the system to fetch for the address of a specific block is considered to be a metadata-intensive workload. For more information, see the following:
    Metering: How Amazon EFS reports file system and object sizes and Performance tips.

Mount options

  • If you mount the file system with amazon-efs-utils, then the recommended mount options are applied by default.
  • If you use non-default mount options, it potentially degrades performance. For example, if you use lower rsize and wsize, or if you lower or turn off Attribute Caching. Check the output of mount command to see the mount options currently in place:

For more information, see Mount the file system on the EC2 instance and test.

NFS client version

The Network File System (NFS) version 4.1 (NFSv4) protocol provides better performance for parallel small-file read operations (greater than 10,000 files per second) compared to NFSv4.0 (less than 1,000 files per second).

Client-side limitations

Bottleneck at the EC2 instance

If your application that uses the file system doesn't drive the expected performance from EFS, then optimize the application. Also, benchmark the host or service that your application is hosted on, such as Amazon EC2, AWS Lambda, and so on. A resource crunch on the EC2 instance might affect your application's ability to use EFS effectively.

To check if EC2 is under-provisioned for your application requirements, monitor Amazon EC2 CloudWatch metrics, such as CPU, Amazon Elastic Block Store (Amazon EBS), and so on. Analyzing various metrics on your application architecture and resource requirements helps you determine whether you should reconfigure your application or instance according to your requirements.

Use the 4.0+ Linux kernel version

For optimal performance and to avoid several known NFS client issues, it's a best practice to use an AMI with a Linux kernel version 4.0 or newer.

An exception to this rule is RHEL and CentOS 7.3 and newer. The kernel for these operating systems received backported versions of the fixes and enhancements applied to NFS v4.1. For more information, see NFS support.

Copy files

When you copy files with the cp command, you might experience slowness. This is because the copy command is a serial operation, which means that it copies each file one at a time. If the file size for each file is small, the throughput to send that file is small.

You might also notice latency when you send files. The distributed nature of EFS means that it must replicate to all mount points, so there is overhead per file operation. Therefore, latency is expected behavior.

Recommendations

It's a best practice to run parallel I/O operations, such as with rsync. If you use rsync, be aware that cp and rsync work in serial (single-threaded) operations instead of parallel operations. This slow the copy process. Use tools such as fpart or NU Parallel. Fpart is a tool that helps you sort file trees and pack them into "partitions". Fpart comes with a shell script called fpsync that wraps fpart and rsync to launch several rsync in parallel. Fpsync provides its own embedded scheduler. This method completes tasks faster than the more common serial method.

For more information, see Amazon EFS performance.

Related information

Quotas for NFS clients

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago
2 Comments

The Optimizing small-file performance link isn't working correctly. The link takes you to the What is Amazon Elastic File System? page. I think it's intended to link to Amazon EFS performance tips, which has a sub-section called Optimizing small-file performance.

AWS
kdamas
replied 7 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 7 months ago