- Newest
- Most votes
- Most comments
Hello, thank you for your post. Before you run the mpirun command, please make sure you have add the EFA library to the path. Depending upon which operating sytem you are using, you may use one of the following commands[1].
Amazon Linux, Amazon Linux 2, RHEL , and CentOS
$ export LD_LIBRARY_PATH=/opt/amazon/efa/lib64:$LD_LIBRARY_PATH
Ubuntu 18.04/20.04
$ export LD_LIBRARY_PATH=/opt/amazon/efa/lib:$LD_LIBRARY_PATH
References:
[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start-nccl-base.html#nccl-start-base-tests
Thanks for your reply@AWS_SamM
The output is still same without EFA indicate after set LD_LIBRARY_PATH=/opt/amazon/efa/lib64:$LD_LIBRARY_PATH
[ec2-user@ip code]$ export LD_LIBRARY_PATH=/opt/amazon/efa/lib64:$LD_LIBRARY_PATH
[ec2-user@ip]$ export OMPI_MCA_mtl_base_verbose=100
[ec2-user@ip]$ sbatch submit_job_openmpi Submitted batch job 5
[ec2-user@ip]$ cat hello-world-job_5.out
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_register: registering framework mtl components
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_register: found loaded component ofi
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_register: component ofi register function successful
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_open: opening mtl components
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_open: found loaded component ofi
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_open: component ofi open function successful
[libhe-dy-c5n18xlarge-1:11601] mca: base: components_register: registering framework mtl components
[libhe-dy-c5n18xlarge-1:11601] mca: base: components_register: found loaded component ofi
[libhe-dy-c5n18xlarge-1:11601] mca: base: components_register: component ofi register function successful
[libhe-dy-c5n18xlarge-1:11601] mca: base: components_open: opening mtl components
[libhe-dy-c5n18xlarge-1:11601] mca: base: components_open: found loaded component ofi
[libhe-dy-c5n18xlarge-1:11601] mca: base: components_open: component ofi open function successful
[libhe-dy-c5n18xlarge-2:11589] mca:base:select: Auto-selecting mtl components
[libhe-dy-c5n18xlarge-2:11589] mca:base:select:( mtl) Querying component [ofi]
[libhe-dy-c5n18xlarge-2:11589] mca:base:select:( mtl) Query of component [ofi] set priority to 25
[libhe-dy-c5n18xlarge-2:11589] mca:base:select:( mtl) Selected component [ofi]
[libhe-dy-c5n18xlarge-2:11589] select: initializing mtl component ofi
[libhe-dy-c5n18xlarge-2:11589] mtl_ofi_component.c:366: mtl:ofi:provider: rdmap0s6-rdm
[libhe-dy-c5n18xlarge-1:11601] mca:base:select: Auto-selecting mtl components
[libhe-dy-c5n18xlarge-1:11601] mca:base:select:( mtl) Querying component [ofi]
[libhe-dy-c5n18xlarge-1:11601] mca:base:select:( mtl) Query of component [ofi] set priority to 25
[libhe-dy-c5n18xlarge-1:11601] mca:base:select:( mtl) Selected component [ofi]
[libhe-dy-c5n18xlarge-1:11601] select: initializing mtl component ofi
[libhe-dy-c5n18xlarge-1:11601] mtl_ofi_component.c:366: mtl:ofi:provider: rdmap0s6-rdm
[libhe-dy-c5n18xlarge-2:11589] select: init returned success
[libhe-dy-c5n18xlarge-2:11589] select: component ofi selected
[libhe-dy-c5n18xlarge-1:11601] select: init returned success
[libhe-dy-c5n18xlarge-1:11601] select: component ofi selected
Hello world from processor libhe-dy-c5n18xlarge-1, rank 0 out of 2 processors
Hello world from processor libhe-dy-c5n18xlarge-2, rank 1 out of 2 processors
[libhe-dy-c5n18xlarge-2:11589] mca: base: close: component ofi closed
[libhe-dy-c5n18xlarge-2:11589] mca: base: close: unloading component ofi
[libhe-dy-c5n18xlarge-1:11601] mca: base: close: component ofi closed
[libhe-dy-c5n18xlarge-1:11601] mca: base: close: unloading component ofi
Relevant content
- asked 4 years ago
- asked 5 years ago
- asked 10 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
Thanks for your reply@AWS_SamM
I am using Amazon Linux 2, the output is still same without EFA indicate after set LD_LIBRARY_PATH=/opt/amazon/efa/lib64:$LD_LIBRARY_PATH
[ec2-user@ip code]$ export LD_LIBRARY_PATH=/opt/amazon/efa/lib64:$LD_LIBRARY_PATH [ec2-user@ip]$ export OMPI_MCA_mtl_base_verbose=100 [ec2-user@ip]$ sbatch submit_job_openmpi Submitted batch job 5
[ec2-user@ip]$ cat hello-world-job_5.out
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_register: registering framework mtl components
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_register: found loaded component ofi
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_register: component ofi register function successful
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_open: opening mtl components
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_open: found loaded component ofi
[libhe-dy-c5n18xlarge-2:11589] mca: base: components_open: component ofi open function successful