DMA throughput of cl_dram_dma example

0

Hello everyone!

I have a small question regarding to DMA throughput of cl_dram_dma example with pre-generated afi (aws v1.4.17). The question is in a way duplicate of this one(https://forums.aws.amazon.com/thread.jspa?messageID=851546﹚). Yet, since it has been a while since this thread is posted. I wanted to formulate another one. If it is against the forum guidelines, I can reply to that thread too (Sorry for inconvenience).

I have been running some tests with the cl_dram_dma example. I measured that the read throughput of ~4 GB/s and write throughput of ~1.6 GB/s. In that discussion, it is stated that the throughput can be increased upto 10 GB/s. I measure the throughput in c program, simply measuring the execution times of read and write.

What am I configuring incorrectly in here? Is this the most I can achieve? I have other applications having the same configurations with the cl_dram_dma example. I am trying to obtain best I can achieve this example so that I can estimate it for my own kernels. Also, I have seen the file cl_test.sv and read that its a traffic generator. Is there a way to configure kernel and use that module for more reliable tests?

Thanks in advance!

Edited by: yonselyuksel on Mar 5, 2021 8:45 AM

asked 3 years ago288 views
6 Answers
0

Hi,

We bench marked DMA performance using cl_dram_dma example and FIO tools. Do you see poor performance when using FIO tests?

Please refer to the following guidelines to run benchmark test using the CL_DRAM_DMA AFI provided by AWS:
https://github.com/aws/aws-fpga/blob/master/sdk/tests/fio_dma_tools/README.md

Additional items to consider:

  1. Please ensure the CL_DRAM_DMA is running at 250 MHz. (command to load AFI @250MHz : fpga-load-local-image -S 0 -I agfi-02f141212beac0cfb -F -a 250)
  2. Please use F1.16xL Instance type to achieve best performance.

Yes, https://github.com/aws/aws-fpga/blob/master/hdk/cl/examples/cl_dram_dma/design/cl_tst.sv is basically an AXI4 Traffic Generator, connected to the four DDRs and PCIM Interface in the CL_DRAM_DMA example design. Note, FIO tools cannot be used with this Traffic Generator for performance measurements.

Please let us know if you have more questions.

Thanks!
Chakra

AWS
answered 3 years ago
0

Hi,

Indeed, I was able to get best results with fio. I have observed that best performance achievable is with options ioengine=psync, iomem=mmap ( as specified in the example setup). My target instance is f1.2xlarge. In case I want to achieve as good performance as fio, would it be enough to use this kind of configuration? Would you recommend any other extra step. (lets say my file sizes are 16GB and less). I have seen that fio dispatches multiple jobs in parallel. Is there any other optimizations, which is not visible to the developer, offered other than running multiple jobs?

I see that my questions are more application specific. I am only trying to understand fio.

Also, I could not see the example agfi, which you have sent in the reply thread. Is it the same with the one(cl_dram_dma) in the aws-fpga repo?

Thanks a lot!

Edited by: yonselyuksel on Mar 8, 2021 1:37 AM

Edited by: yonselyuksel on Mar 8, 2021 5:38 AM

answered 3 years ago
0

Hi,

The XDMA (DMA Engine inside AWS Shell) supports 4 channels for Data Transfers between Host and FPGA. You will get better performance when all four channels are engaged. FIO tool also leverages this feature. Example :
https://github.com/aws/aws-fpga/blob/master/sdk/tests/fio_dma_tools/README.md#what-are-the-fio-config-file-naming-conventions

Typically, larger block size helps in achieving max performance since there will be less overhead for data transfers.

F1.2xL may give you max performance at times, but is not guaranteed. We therefore recommend using F1.16xL for performance critical applications.

Sorry my bad, I think I shared older AGFI in my previous reply. The latest AFI for CL_DRAM_DMA is noted here:
https://github.com/aws/aws-fpga/tree/master/hdk/cl/examples/cl_dram_dma#dram-dma-example-metadata
Pre-generated AFI ID : afi-063e6afe717a22158
Pre-generated AGFI ID: agfi-0b5c35827af676702

Thanks!
Chakra

AWS
answered 3 years ago
0

Thanks a lot!

I am also trying to visualize the affect of memory alignment and have a small question regarding to it:
The documentation of fio states that, using iodepth the alignment is given by blocksize. Now, I am running block sizes which are multiple of page size. In order to see the affect of mem_align option, should I delete iodepth option from my .fio file? (I declare io_memalign under [global]) Right now, even though I change the mem_align, I do not see any affect of it.

answered 3 years ago
0

hello,

Unfortunately, we do not have data on performance effects due to various combinations of parameters offered by FIO tool. We achieved best possible performance on CL_DRAM_DMA using the FIO script provided in:
https://github.com/aws/aws-fpga/blob/master/sdk/tests/fio_dma_tools/scripts/xdma_4-ch_4-1M_read.fio
https://github.com/aws/aws-fpga/blob/master/sdk/tests/fio_dma_tools/scripts/xdma_4-ch_4-1M_write.fio

The following link describes the parameters supported by FIO:
https://github.com/axboe/fio/blob/master/HOWTO#L1762

Please contact us if you need any other details.

Thanks!
Chakra

AWS
answered 3 years ago
0

Thanks a lot,

Indeed, fio gives enough coverage in my case so that I can analyze different configurations.

My question is resolved.

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions