Cannot register memory region for remote rdma read on EFA.

0

OS
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
Kernel
5.4.0-1045-aws

efadv_query_device showed EFADV_DEVICE_ATTR_CAPS_RDMA_READ is not available.
I followed the steps here:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html
Is remote RDMA read supported on EFA?

crhu
已提问 3 年前491 查看次数
8 回答
0

You do not mention which instance type you are using for your tests, but currently RDMA semantics with EFA are only natively supported on P4d instances.

AWS
已回答 3 年前
0

The instance used was c5n.18xlarge.

crhu
已回答 3 年前
0

Are there future plans to support all EFA instances?

I see libfabric supports rma remote read/write, is it implemented on top of send? Or is it only supported for p4d instances as well?

Edited by: crhu on Jun 9, 2021 11:05 PM

crhu
已回答 3 年前
0

Obviously, we want to grow the number of instance types that support RDMA with EFA beyond P4. Not surprisingly, I cannot comment on specifics of our plans in this forum.

The EFA provider for Libfabric does expose the FI_RMA interface, and will automatically detect if the EFA hardware supports RDMA operations and either use native RDMA features or an emulated send/receive path.

It's worth noting that our RDMA read operation does not conform to the InfiniBand spec (it is not InfiniBand, after all!). In particular, there is no read-once or write-once guarantee. In a retransmit case, we will re-read the source buffer and may write the data more than once. In Libfabric, this is expressed by requiring completion events for the RDMA operations. We also do not provide byte-ordered data placement, so you can not do "poll on last byte" tricks.

AWS
已回答 3 年前
0

Thanks for the information, Brian. If possible, when will we expect RDMA read support on other EFA instances?

crhu
已回答 3 年前
0

Hi Brian, can you confirm a couple of technical details with me on libfabric efa provider? In the emulated RDMA read, the sender sends RXR_SHORT_RTR_PKT/RXR_LONG_RTR_PKT packet to the receiver with the requested read addresses. The receiver polls/receives and inspects the header packet type and does a memcpy on the requested read addresses to RXR_READRSP_PKT. The receiver then sends back the RXR_READRSP_PKT to the requested sender. The sender polls/receives the RXR_READRSP_PKT and does memcpy of the received data to the final destination. Instead of 0 memcpy for real RDMA, the emulated RDMA has memcpy on both sender and receiver side

Edited by: crhu on Jun 15, 2021 11:47 PM

crhu
已回答 3 年前
0

Correct. The receive side memcpy is basically required, because of all the usual memory placement issues. On the send side (the target of the read), there is a potential optimization for larger read requests to send directly from the user buffer, but given the way MPI uses the Libfabric interface, that hasn't been a priority for the team yet.

AWS
已回答 3 年前
0

Hello, I tried to run perftest on both p4d and p3dn to test RDMA recently, but it always shows error "Couldn't allocate MR". I also tried to set the memory lock to unlimited in /etc/security/limits.conf, but it doesn't work. Do you have any suggestions about this?

jiawei
已回答 3 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则