Why is my Amazon EFS file system performance slow?

6 minute read
0

My Amazon Elastic File System (Amazon EFS) performance is slow. I want to identify the cause and troubleshoot the issue.

Resolution

To determine the cause of your Amazon EFS slow performance, review the following options and settings. Follow the actions that fit your use case.

Storage class of Amazon EFS

Amazon EFS has 3 types of storage classes:

  • EFS Standard
  • EFS Infrequent Access (IA)
  • EFS Archive

To provide the lowest levels of latency, the files stored in EFS Standard storage use SSD. When accessed, EFS IA and EFS Archive are likely to experience first-byte latencies of tens of milliseconds. For more information, see Storage Classes and EFS storage classes.

Performance and throughput modes

Performance modes

Amazon EFS offers two performance modes, General Purpose and Max I/O. To determine what performance mode to use, see Performance modes.

Throughput modes

The configured throughput affects the performance of Amazon EFS. To check for limitations caused by the throughput mode, view the EFS Throughput Utilization (%) graph. To navigate to the graph, open the EFS Console. Choose File Systems, and then select your file system. Choose Monitoring. If the graph is at 100%, then do one of the following options based on your throughput mode:

  • Bursting Throughput Mode: Use the BurstCreditBalance metric in Amazon CloudWatch to check the balance of your burst credits. If the balance is 0, then you can have performance issues. After the file system uses all burst credits, the file system provides a maximum throughput at the baseline rate. For more information about Burst Credits, see How do Amazon EFS burst credits work? To recover the EFS file system's performance, change the throughput mode to Provisioned or Elastic.
    Note: Be sure to review the cost for the enhanced throughput before you change throughput mode. Also, check the restrictions associated with a switch back to Bursting.
  • Provisioned Throughput Mode: The maximum throughput that can be achieved by the file system is equal to the configured value of this throughput mode. The maximum throughput is indicated by the PermittedThroughput metric in CloudWatch. You can't achieve throughput greater than the value that the file system is configured with.
    If you reach the throughput limits, then update to a higher value for Provisioned Throughput.
  • Elastic Throughput Mode: This mode provides throughput up to 20 GiBs and an IOPS limit up to 250,000 per file system. If you exceed the throughput limit, then request an increase in the quota limit.

To determine what throughput mode to use, see What is the right throughput mode for my workload in Amazon EFS?

Types of operations performed on the EC2 instance

Metadata I/O operations

Amazon EFS performance worsens in the following situations:

  • When file sizes are small, the distributed architecture of Amazon EFS results in a small latency overhead for each file operation. Because of this per-operation latency, overall throughput generally increases as the average I/O size increases. This increase happens because the overhead is amortized over more data. For more information, see Optimizing small-file performance.
  • Performance on shared file systems suffers if a workload or operation generates many small files serially. This causes the overhead of each operation to increase.
  • Metadata I/O occurs if your application performs metadata-intensive, operations such as "ls," "rm," "mkdir," "rmdir," "lookup," "getattr," "setattr," and so on. An operation that requires the system to fetch for the address of a specific block is considered to be a metadata-intensive workload. For more information, see How Amazon EFS reports file system and object sizes and Amazon EFS performance tips.

Mount options

If you mount the file system with amazon-efs-utils, then the recommended mount options are applied by default. If you use non-default mount options, then performance can degrade. For example, if you use lower rsize and wsize, or if you lower or turn off Attribute Caching. Check the output of the mount command to see the mount options currently in place.

NFS client version

The Network File System (NFS) version 4.1 (NFSv4) protocol provides better performance for parallel small-file read operations than NFSv4.0.

Client-side limitations

Bottleneck at the EC2 instance

If your application that uses the file system doesn't drive the expected performance from Amazon EFS, then optimize the application. Also, benchmark the host or service that your application is hosted on, such as Amazon Elastic Compute Cloud (Amazon EC2), AWS Lambda, and so on. A resource crunch on the EC2 instance might affect your application's ability to effectively use Amazon EFS.

To check if Amazon EC2 is under-provisioned for your application requirements, monitor Amazon EC2 CloudWatch metrics. The metrics on your application architecture and resource requirements will help you determine whether you should reconfigure your application or instance.

Use the 4.0+ Linux kernel version

To avoid known NFS client issues and optimize performance, use an Amazon Machine Image (AMI) with a Linux kernel version 4.0 or newer. An exception to this rule is RHEL and CentOS 7.3 and newer. The kernel for these operating systems received backported versions of the fixes and enhancements applied to NFS v4.1. For more information, see NFS support.

Copy files

When you copy files with the cp command, you might experience slowness. The copy command is a serial operation that copies each file one at a time. If the file size is small for each file, then the throughput to send that file is small. There is overhead per file operation because Amazon EFS must replicate to all mount points. Therefore, when you send files, you might notice latency.

It's a best practice to use multiple copy sessions to run parallel I/O operations. For example, if you have 1 directory with 5 subdirectories, then create a different COPY session for each subdirectory.

Note: Tools like cp and rsync work in serial (single-threaded) operations instead of parallel operations and make the copy process slower. It's a best practice to use tools like fpsync, msrsync, or GNU Parallel to run jobs in parallel on an Amazon EFS file system. For more information, see How can I copy data to and from Amazon EFS in parallel to maximize performance on my EC2 instance?

Related information

Quotas for NFS clients

Amazon EFS performance

Amazon Elastic File System (Amazon EFS) performance tutorial section on the amazon-efs-tutorial page from the GitHub website

AWS OFFICIAL
AWS OFFICIALUpdated 7 months ago
2 Comments

The Optimizing small-file performance link isn't working correctly. The link takes you to the What is Amazon Elastic File System? page. I think it's intended to link to Amazon EFS performance tips, which has a sub-section called Optimizing small-file performance.

AWS
replied a year ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied a year ago