Cannot register memory region for remote rdma read on EFA.

0

OS
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
Kernel
5.4.0-1045-aws

efadv_query_device showed EFADV_DEVICE_ATTR_CAPS_RDMA_READ is not available.
I followed the steps here:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html
Is remote RDMA read supported on EFA?

crhu
asked 3 years ago483 views
8 Answers
0

You do not mention which instance type you are using for your tests, but currently RDMA semantics with EFA are only natively supported on P4d instances.

AWS
answered 3 years ago
0

The instance used was c5n.18xlarge.

crhu
answered 3 years ago
0

Are there future plans to support all EFA instances?

I see libfabric supports rma remote read/write, is it implemented on top of send? Or is it only supported for p4d instances as well?

Edited by: crhu on Jun 9, 2021 11:05 PM

crhu
answered 3 years ago
0

Obviously, we want to grow the number of instance types that support RDMA with EFA beyond P4. Not surprisingly, I cannot comment on specifics of our plans in this forum.

The EFA provider for Libfabric does expose the FI_RMA interface, and will automatically detect if the EFA hardware supports RDMA operations and either use native RDMA features or an emulated send/receive path.

It's worth noting that our RDMA read operation does not conform to the InfiniBand spec (it is not InfiniBand, after all!). In particular, there is no read-once or write-once guarantee. In a retransmit case, we will re-read the source buffer and may write the data more than once. In Libfabric, this is expressed by requiring completion events for the RDMA operations. We also do not provide byte-ordered data placement, so you can not do "poll on last byte" tricks.

AWS
answered 3 years ago
0

Thanks for the information, Brian. If possible, when will we expect RDMA read support on other EFA instances?

crhu
answered 3 years ago
0

Hi Brian, can you confirm a couple of technical details with me on libfabric efa provider? In the emulated RDMA read, the sender sends RXR_SHORT_RTR_PKT/RXR_LONG_RTR_PKT packet to the receiver with the requested read addresses. The receiver polls/receives and inspects the header packet type and does a memcpy on the requested read addresses to RXR_READRSP_PKT. The receiver then sends back the RXR_READRSP_PKT to the requested sender. The sender polls/receives the RXR_READRSP_PKT and does memcpy of the received data to the final destination. Instead of 0 memcpy for real RDMA, the emulated RDMA has memcpy on both sender and receiver side

Edited by: crhu on Jun 15, 2021 11:47 PM

crhu
answered 3 years ago
0

Correct. The receive side memcpy is basically required, because of all the usual memory placement issues. On the send side (the target of the read), there is a potential optimization for larger read requests to send directly from the user buffer, but given the way MPI uses the Libfabric interface, that hasn't been a priority for the team yet.

AWS
answered 3 years ago
0

Hello, I tried to run perftest on both p4d and p3dn to test RDMA recently, but it always shows error "Couldn't allocate MR". I also tried to set the memory lock to unlimited in /etc/security/limits.conf, but it doesn't work. Do you have any suggestions about this?

jiawei
answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions