Issue while configuring NVMe-oF with EFA EC2 instance

0

Need to setup NVMe-oF with EFA EC2 instance

This post link discuss NVMe-oF on i3en instances, this post onward refers to an external link for NVMe-oF client and target configuration on RHEL.

However while following above steps over AWS i3en instance with RHEL, we face following issue while creating soft link at step 11 and observe error in dmesg.

  • ** soft link**

ln -fs /sys/kernel/config/nvmet/subsystems/nvmet-rdma /sys/kernel/config/nvmet/ports/1/subsystems/nvmet-rdma

** ln failed to create symbolic link '/sys/kernel/config/nvmet/ports/1/subsystems/nvmet-rdma':No such device**

  • Dmesg:

nvmet_rdma: binding CM ID to <IPv4 Add>:4420 failed (-19)

Any solution for above error messages? OR Any AWS detail guide/document for, how to configure an AWS NVMe-oF (EFA) client and target?

  • Hello, I am the author of the Blog in question. Unfortunately, RH recently changed their manual, and no longer give the steps for NVMe-oF/TCP, they only give NVMe-of/RDMA. I have setup NVMe-of/RDMA, but it requires RoCE. In all my testing, I prefer NVMe-oF/TCP, and that's where the market is going so I expect a lot of support/documentation to occur in the open systems market going forward. Replace the "modprobe nvmet-rdma" with "modprobe nvmet-tcp". Make sure you are root. NOTE: if you look at dmesg, after setup you can find a warning that NVMe TCP is in tech preview for RHEL 8.X. This is because RHEL is especially cautious about new IO stacks. All the other Linux variants with a 4.x linux kernel have used it, and of course all 5.x linux kernels do. see this (slightly) outdated summary article https://www.lightbitslabs.com/blog/linux-distributions-nvme-support/ Of course if you are worried about RHEL 8.x tech preview status, you can used SLES as it now has a 5.x kernel and it never had TCP listed as tech preview. Feel free to contact me with questions. seamasnr@amazon.com

  • Hello, thank you for the prompt reply. Previously conducted MPI osu_benchmark over EFA with additional HPC libfabric layer via Parallel Cluster. I presume that EFA NIC is capable of RDMA ! Any pointer to configure/setup NVMe-oF/RoCE with EFA capable instances (for AWS users)? Additionally when we configure NVMe-oF/TCP, will it utilize/offload over EFA OR ENA ethernet controller?

asked 2 years ago154 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions