- 最新
- 最多得票
- 最多評論
Hello, thank you for your post. Before you run the mpirun command, please make sure you have add the EFA library to the path. Depending upon which operating sytem you are using, you may use one of the following commands[1].
Amazon Linux, Amazon Linux 2, RHEL , and CentOS
$ export LD_LIBRARY_PATH=/opt/amazon/efa/lib64:$LD_LIBRARY_PATH
Ubuntu 18.04/20.04
$ export LD_LIBRARY_PATH=/opt/amazon/efa/lib:$LD_LIBRARY_PATH
References:
[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start-nccl-base.html#nccl-start-base-tests
Thanks for your reply@AWS_SamM
The output is still same without EFA indicate after set LD_LIBRARY_PATH=/opt/amazon/efa/lib64:$LD_LIBRARY_PATH
[ec2-user@ip code]$ export LD_LIBRARY_PATH=/opt/amazon/efa/lib64:$LD_LIBRARY_PATH
[ec2-user@ip]$ export OMPI_MCA_mtl_base_verbose=100
[ec2-user@ip]$ sbatch submit_job_openmpi Submitted batch job 5
[ec2-user@ip]$ cat hello-world-job_5.out
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_register: registering framework mtl components
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_register: found loaded component ofi
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_register: component ofi register function successful
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_open: opening mtl components
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_open: found loaded component ofi
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_open: component ofi open function successful
[libhe-dy-c5n18xlarge-1:11601] mca: base: components_register: registering framework mtl components
[libhe-dy-c5n18xlarge-1:11601] mca: base: components_register: found loaded component ofi
[libhe-dy-c5n18xlarge-1:11601] mca: base: components_register: component ofi register function successful
[libhe-dy-c5n18xlarge-1:11601] mca: base: components_open: opening mtl components
[libhe-dy-c5n18xlarge-1:11601] mca: base: components_open: found loaded component ofi
[libhe-dy-c5n18xlarge-1:11601] mca: base: components_open: component ofi open function successful
[libhe-dy-c5n18xlarge-2:11589] mca:base:select: Auto-selecting mtl components
[libhe-dy-c5n18xlarge-2:11589] mca:base:select:( mtl) Querying component [ofi]
[libhe-dy-c5n18xlarge-2:11589] mca:base:select:( mtl) Query of component [ofi] set priority to 25
[libhe-dy-c5n18xlarge-2:11589] mca:base:select:( mtl) Selected component [ofi]
[libhe-dy-c5n18xlarge-2:11589] select: initializing mtl component ofi
[libhe-dy-c5n18xlarge-2:11589] mtl_ofi_component.c:366: mtl:ofi:provider: rdmap0s6-rdm
[libhe-dy-c5n18xlarge-1:11601] mca:base:select: Auto-selecting mtl components
[libhe-dy-c5n18xlarge-1:11601] mca:base:select:( mtl) Querying component [ofi]
[libhe-dy-c5n18xlarge-1:11601] mca:base:select:( mtl) Query of component [ofi] set priority to 25
[libhe-dy-c5n18xlarge-1:11601] mca:base:select:( mtl) Selected component [ofi]
[libhe-dy-c5n18xlarge-1:11601] select: initializing mtl component ofi
[libhe-dy-c5n18xlarge-1:11601] mtl_ofi_component.c:366: mtl:ofi:provider: rdmap0s6-rdm
[libhe-dy-c5n18xlarge-2:11589] select: init returned success
[libhe-dy-c5n18xlarge-2:11589] select: component ofi selected
[libhe-dy-c5n18xlarge-1:11601] select: init returned success
[libhe-dy-c5n18xlarge-1:11601] select: component ofi selected
Hello world from processor libhe-dy-c5n18xlarge-1, rank 0 out of 2 processors
Hello world from processor libhe-dy-c5n18xlarge-2, rank 1 out of 2 processors
[libhe-dy-c5n18xlarge-2:11589] mca: base: close: component ofi closed
[libhe-dy-c5n18xlarge-2:11589] mca: base: close: unloading component ofi
[libhe-dy-c5n18xlarge-1:11601] mca: base: close: component ofi closed
[libhe-dy-c5n18xlarge-1:11601] mca: base: close: unloading component ofi
已回答 4 年前
maybe first check with fi_pingpong test , do ensure all libfabric , driver and efa is properly configured
已回答 10 個月前
相關內容
已提問 10 個月前
已提問 2 年前

Thanks for your reply@AWS_SamM
I am using Amazon Linux 2, the output is still same without EFA indicate after set LD_LIBRARY_PATH=/opt/amazon/efa/lib64:$LD_LIBRARY_PATH
[ec2-user@ip code]$ export LD_LIBRARY_PATH=/opt/amazon/efa/lib64:$LD_LIBRARY_PATH [ec2-user@ip]$ export OMPI_MCA_mtl_base_verbose=100 [ec2-user@ip]$ sbatch submit_job_openmpi Submitted batch job 5
[ec2-user@ip]$ cat hello-world-job_5.out
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_register: registering framework mtl components
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_register: found loaded component ofi
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_register: component ofi register function successful
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_open: opening mtl components
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_open: found loaded component ofi
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_open: component ofi open function successful