Cannot register memory region for remote rdma read on EFA.

Question

OS  
NAME="Ubuntu"  
VERSION="18.04.5 LTS (Bionic Beaver)"  
ID=ubuntu  
ID_LIKE=debian  
PRETTY_NAME="Ubuntu 18.04.5 LTS"  
VERSION_ID="18.04"  
Kernel  
5.4.0-1045-aws  
  
efadv_query_device showed EFADV_DEVICE_ATTR_CAPS_RDMA_READ is not available.  
I followed the steps here:  
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html  
Is remote RDMA read supported on EFA?

Answer

Obviously, we want to grow the number of instance types that support RDMA with EFA beyond P4.  Not surprisingly, I cannot comment on specifics of our plans in this forum.  
  
The EFA provider for Libfabric does expose the FI_RMA interface, and will automatically detect if the EFA hardware supports RDMA operations and either use native RDMA features or an emulated send/receive path.  
  
It's worth noting that our RDMA read operation does not conform to the InfiniBand spec (it is not InfiniBand, after all!).  In particular, there is no read-once or write-once guarantee.  In a retransmit case, we will re-read the source buffer and may write the data more than once.  In Libfabric, this is expressed by requiring completion events for the RDMA operations.  We also do not provide byte-ordered data placement, so you can not do "poll on last byte" tricks.

Answer

Thanks for the information, Brian. If possible, when will we expect RDMA read support on other EFA instances?

Answer

You do not mention which instance type you are using for your tests, but currently RDMA semantics with EFA are only natively supported on P4d instances.

Answer

Hi Brian, can you confirm a couple of technical details with me on libfabric efa provider? In the emulated RDMA read, the sender sends RXR_SHORT_RTR_PKT/RXR_LONG_RTR_PKT packet to the receiver with the requested read addresses. The receiver polls/receives and inspects the header packet type and does a memcpy on the requested read addresses to RXR_READRSP_PKT. The receiver then sends back the RXR_READRSP_PKT to the requested sender. The sender polls/receives the RXR_READRSP_PKT and does memcpy of the received data to the final destination. Instead of 0 memcpy for real RDMA, the emulated RDMA has memcpy on both sender and receiver side  
  
Edited by: crhu on Jun 15, 2021 11:47 PM

Answer

Hello, I tried to run perftest on both p4d and p3dn to test RDMA recently, but it always shows error "Couldn't allocate MR". I also tried to set the memory lock to unlimited in /etc/security/limits.conf, but it doesn't work. Do you have any suggestions about this?

Answer

Correct.  The receive side memcpy is basically required, because of all the usual memory placement issues.  On the send side (the target of the read), there is a potential optimization for larger read requests to send directly from the user buffer, but given the way MPI uses the Libfabric interface, that hasn't been a priority for the team yet.

Answer

Are there future plans to support all EFA instances?  
  
I see libfabric supports rma remote read/write, is it implemented on top of send? Or is it only supported for p4d instances as well?  
  
Edited by: crhu on Jun 9, 2021 11:05 PM

Answer

The instance used was c5n.18xlarge.

Cannot register memory region for remote rdma read on EFA.

관련 콘텐츠