Issue while configuring NVMe-oF with EFA EC2 instance

0

Need to setup NVMe-oF with EFA EC2 instance

This post link discuss NVMe-oF on i3en instances, this post onward refers to an external link for NVMe-oF client and target configuration on RHEL.

However while following above steps over AWS i3en instance with RHEL, we face following issue while creating soft link at step 11 and observe error in dmesg.

  • ** soft link**

ln -fs /sys/kernel/config/nvmet/subsystems/nvmet-rdma /sys/kernel/config/nvmet/ports/1/subsystems/nvmet-rdma

** ln failed to create symbolic link '/sys/kernel/config/nvmet/ports/1/subsystems/nvmet-rdma':No such device**

  • Dmesg:

nvmet_rdma: binding CM ID to <IPv4 Add>:4420 failed (-19)

Any solution for above error messages? OR Any AWS detail guide/document for, how to configure an AWS NVMe-oF (EFA) client and target?

  • Hello, I am the author of the Blog in question. Unfortunately, RH recently changed their manual, and no longer give the steps for NVMe-oF/TCP, they only give NVMe-of/RDMA. I have setup NVMe-of/RDMA, but it requires RoCE. In all my testing, I prefer NVMe-oF/TCP, and that's where the market is going so I expect a lot of support/documentation to occur in the open systems market going forward. Replace the "modprobe nvmet-rdma" with "modprobe nvmet-tcp". Make sure you are root. NOTE: if you look at dmesg, after setup you can find a warning that NVMe TCP is in tech preview for RHEL 8.X. This is because RHEL is especially cautious about new IO stacks. All the other Linux variants with a 4.x linux kernel have used it, and of course all 5.x linux kernels do. see this (slightly) outdated summary article https://www.lightbitslabs.com/blog/linux-distributions-nvme-support/ Of course if you are worried about RHEL 8.x tech preview status, you can used SLES as it now has a 5.x kernel and it never had TCP listed as tech preview. Feel free to contact me with questions. seamasnr@amazon.com

  • Hello, thank you for the prompt reply. Previously conducted MPI osu_benchmark over EFA with additional HPC libfabric layer via Parallel Cluster. I presume that EFA NIC is capable of RDMA ! Any pointer to configure/setup NVMe-oF/RoCE with EFA capable instances (for AWS users)? Additionally when we configure NVMe-oF/TCP, will it utilize/offload over EFA OR ENA ethernet controller?

demandé il y a 2 ans157 vues
Aucune réponse

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions