efa_installer.sh fail on Testing EFA device in the last few days

0

Today I follow efa get started doc https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html#efa-start-enable to install EFA software on Amazon Linux 2, it failed on the last step "Testing EFA device" with below error:

$ sudo ./efa_installer.sh -y

= Starting Amazon Elastic Fabric Adapter Installation Script =

= EFA Installer Version: 1.17.2 = ......

......

== Testing EFA device ==

Starting server...

Starting client...

Error: fi_pingpong test timed out.

============================================================================== An EFA device has been detected but a ping test has failed. Please consult the

EFA documentation to verify your configuration.

===========================================================================

=================================================== EFA installation complete.

  • Please logout/login to complete the installation.
  • Libfabric was installed in /opt/amazon/efa
  • Open MPI was installed in /opt/amazon/openmpi

===================================================

Log from dmesg:

[ 47.835056] efa 0000:00:06.0 rdmap0s6: Unregister ib device

[ 47.966388] Elastic Fabric Adapter (EFA) v1.16.0g

[ 48.080918] efa 0000:00:06.0: Setup irq:27 name:efa-mgmnt@pci:0000:00:06.0

[ 48.085540] efa 0000:00:06.0 efa_0: IB device registered

[ 50.633468] efa 0000:00:06.0 rdmap0s6: Failed to process command DEREG_MR (opcode 8) comp_status 7 err -22

[ 50.637880] efa 0000:00:06.0 rdmap0s6: Failed to de-register mr(lkey-4) [-22]

[ 50.647230] efa 0000:00:06.0 rdmap0s6: Failed to process command DEREG_MR (opcode 8) comp_status 7 err -22

[ 50.651981] efa 0000:00:06.0 rdmap0s6: Failed to de-register mr(lkey-3) [-22]

However, it does work just a few days ago(last week) when I did the same testing. Is there any update for the EFA in the last few days?

질문됨 2년 전960회 조회
1개 답변
0

Hello, thank you for your post. Please refer to the documentation on EFA limitations. I suspect that there may be a problem with the Security Group configuration for the instance on which you ran the EFA installer.

The EFA must be a member of a security group that allows all inbound and outbound traffic to and from the security group itself.

Please consider opening a support case so that a support engineer can assist you with reviewing the configuration of the EC2 instance and troubleshoot the EFA pingpong test.

AWS
지원 엔지니어
답변함 2년 전
  • I have a same issue like that: efa_test.sh with below error: Starting server...

    Starting client...

    Error: fi_pingpong test timed out.

    dmesg: [ 803.733432] efa 0000:00:06.0 rdmap0s6: Failed to process command DEREG_MR (opcode 8) comp_status 7 err -22 [ 803.737575] efa 0000:00:06.0 rdmap0s6: Failed to de-register mr(lkey-0) [-22] [ 892.965223] efa 0000:00:06.0 rdmap0s6: Failed to process command DEREG_MR (opcode 8) comp_status 7 err -22 [ 892.969332] efa 0000:00:06.0 rdmap0s6: Failed to de-register mr(lkey-0) [-22]

    my security group had allowed all inbound and outbound traffic to and from

    so what's the problem?

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인