efa_installer.sh fail on Testing EFA device in the last few days

0

Today I follow efa get started doc https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html#efa-start-enable to install EFA software on Amazon Linux 2, it failed on the last step "Testing EFA device" with below error:

$ sudo ./efa_installer.sh -y

= Starting Amazon Elastic Fabric Adapter Installation Script =

= EFA Installer Version: 1.17.2 = ......

......

== Testing EFA device ==

Starting server...

Starting client...

Error: fi_pingpong test timed out.

============================================================================== An EFA device has been detected but a ping test has failed. Please consult the

EFA documentation to verify your configuration.

===========================================================================

=================================================== EFA installation complete.

  • Please logout/login to complete the installation.
  • Libfabric was installed in /opt/amazon/efa
  • Open MPI was installed in /opt/amazon/openmpi

===================================================

Log from dmesg:

[ 47.835056] efa 0000:00:06.0 rdmap0s6: Unregister ib device

[ 47.966388] Elastic Fabric Adapter (EFA) v1.16.0g

[ 48.080918] efa 0000:00:06.0: Setup irq:27 name:efa-mgmnt@pci:0000:00:06.0

[ 48.085540] efa 0000:00:06.0 efa_0: IB device registered

[ 50.633468] efa 0000:00:06.0 rdmap0s6: Failed to process command DEREG_MR (opcode 8) comp_status 7 err -22

[ 50.637880] efa 0000:00:06.0 rdmap0s6: Failed to de-register mr(lkey-4) [-22]

[ 50.647230] efa 0000:00:06.0 rdmap0s6: Failed to process command DEREG_MR (opcode 8) comp_status 7 err -22

[ 50.651981] efa 0000:00:06.0 rdmap0s6: Failed to de-register mr(lkey-3) [-22]

However, it does work just a few days ago(last week) when I did the same testing. Is there any update for the EFA in the last few days?

posta 2 anni fa698 visualizzazioni
1 Risposta
0

Hello, thank you for your post. Please refer to the documentation on EFA limitations. I suspect that there may be a problem with the Security Group configuration for the instance on which you ran the EFA installer.

The EFA must be a member of a security group that allows all inbound and outbound traffic to and from the security group itself.

Please consider opening a support case so that a support engineer can assist you with reviewing the configuration of the EC2 instance and troubleshoot the EFA pingpong test.

AWS
TECNICO DI SUPPORTO
con risposta 2 anni fa
  • I have a same issue like that: efa_test.sh with below error: Starting server...

    Starting client...

    Error: fi_pingpong test timed out.

    dmesg: [ 803.733432] efa 0000:00:06.0 rdmap0s6: Failed to process command DEREG_MR (opcode 8) comp_status 7 err -22 [ 803.737575] efa 0000:00:06.0 rdmap0s6: Failed to de-register mr(lkey-0) [-22] [ 892.965223] efa 0000:00:06.0 rdmap0s6: Failed to process command DEREG_MR (opcode 8) comp_status 7 err -22 [ 892.969332] efa 0000:00:06.0 rdmap0s6: Failed to de-register mr(lkey-0) [-22]

    my security group had allowed all inbound and outbound traffic to and from

    so what's the problem?

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande