Lustre performance drops with concurrent read/write workload

0

We have a workload that reads from files at the same time that those files are being appended to. It appears that Lustre does not do well with this workload, which surprised us and made us think we are doing something wrong. Posting our evaluation here to hopefully get some pointers on what we can do to improve things.

To test Lustre performance, we set up the following.

1.2TiB Lustre configured with SSDs @ 250MB/s/TiB (Persistent 2) -- total throughput 300MB/s. We have not configured compression.

Lustre is mounted directly on two separate c7a.xlarge ec2 instances running Ubuntu with Linux kernel 5.15.0-1044-aws. We are mounting with the mount flags noatime,noflock.

For every test we pre-create the files needed for the run first and then remount the filesystem to try and ensure a clean cache. A reader will always read random blocks from the pre-created portion of the file. A writer will always append all of the blocks to the file and then perform an fsync(). As a result, there should be no overlap between what is read and written, even on the same file. The tests run for roughly 20 seconds each, but it depends on throughput.

I ran a few tests with the following results:

  • 1 instance reading 32K blocks of size 64KB ~2900 IOPS == 185MB/s

  • 2 concurrent instances each reading 32K blocks of size 64KB from separate files ~1400 + ~1600 = ~3000 IOPS combined == 190MB/s

  • 2 concurrent instances each reading 32K blocks of size 64KB from the same file ~1800 + ~2900 = ~4700 IOPS combined == 300MB/s

  • 2 concurrent instances, one appending to a file, the other reading from a separate file Reader ~1300 IOPS == 83MB/s Writer ~6878 IOPS == 440MB/s (Not sure how this is possible??)

  • 2 concurrent instances, one appending to a file, the other reading from the same file (but with no reading of newly appended blocks) Reader ~550 IOPS == 35MB/s Writer ~600 IOPS == 38MB/s

I also ran all of the same tests, but with 128K blocks of 8KB block size, to get a sense of the block size impact on performance:

  • 1 instance reading 128K blocks of size 8KB ~6200 IOPS == 50MB/s

  • 2 concurrent instances each reading 128K blocks of size 8KB from separate files ~2700 + ~4500 = ~7200 IOPS combined == 58MB/s

  • 2 concurrent instances each reading 128K blocks of size 8KB from the same file ~5300 + ~3700 = ~9000 IOPS combined == 74MB/s

  • 2 concurrent instances, one appending to a file, the other reading from a separate file Reader ~4100 IOPS == 33MB/s Writer ~38732 IOPS == 310MB/s (??)

  • 2 concurrent instances, one appending to a file, the other reading from the same file Reader ~1200 IOPS == 10MB/s Writer ~1400 IOPS == 11MB/s

Overall, the performance characteristics are consistent across the two block sizes. The biggest concern is the dramatic drop in performance when reading from a file that is being appended to, but I'm also surprised that the performance of concurrent readers is not better.

Craig
질문됨 5달 전184회 조회
1개 답변
0

Hello,

So, I researched on this topic, and I found that the FSx for Lustre filesystem is set up to allow concurrent reads and writes from various clients at the same time, which should not cause performance constraints as long as your filesystem has enough throughput to support the operations.

Now, there are a lot of factors which may impact the throughput you are getting, such as, Linux distribution name, Linux version, instances being used are spot or on-demand, method used to calculate the throughput, mount options being utilised, etc.

Thus, in order to answer your question, we would require details that are non-public information. Please feel free to open a support case with AWS using the following link for the same, so that we can dive deep into your infrastructure and assist you accordingly.

[+] https://support.console.aws.amazon.com/support/home#/case/create

Thank you !

AWS
지원 엔지니어
Sahil_W
답변함 5달 전
profile picture
전문가
검토됨 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠