@@ -641,11 +641,11 @@ OpenSHMEM Collectives
641
641
Network Support
642
642
---------------
643
643
644
- - There are four main MPI network models available: "ob1", "cm",
645
- "yalla ", and "ucx ". "ob1" uses BTL ("Byte Transfer Layer")
644
+ - There are several main MPI network models available: "ob1", "cm",
645
+ "ucx ", and "yalla ". "ob1" uses BTL ("Byte Transfer Layer")
646
646
components for each supported network. "cm" uses MTL ("Matching
647
- Transport Layer") components for each supported network. "yalla"
648
- uses the Mellanox MXM transport. "ucx" uses the OpenUCX transport.
647
+ Transport Layer") components for each supported network. "ucx" uses
648
+ the OpenUCX transport.
649
649
650
650
- "ob1" supports a variety of networks that can be used in
651
651
combination with each other:
@@ -668,42 +668,93 @@ Network Support
668
668
- OpenFabrics Interfaces ("libfabric" tag matching)
669
669
- Portals 4
670
670
671
- Open MPI will, by default, choose to use "cm" when one of the
672
- above transports can be used, unless OpenUCX or MXM support is
673
- detected, in which case the "ucx" or "yalla" PML will be used
674
- by default. Otherwise, "ob1" will be used and the corresponding
675
- BTLs will be selected. Users can force the use of ob1 or cm if
676
- desired by setting the "pml" MCA parameter at run-time:
671
+ - UCX is the Unified Communication X (UCX) communication library
672
+ (http://www.openucx.org/). This is an open-source project
673
+ developed in collaboration between industry, laboratories, and
674
+ academia to create an open-source production grade communication
675
+ framework for data centric and high-performance applications. The
676
+ UCX library can be downloaded from repositories (e.g.,
677
+ Fedora/RedHat yum repositories). The UCX library is also part of
678
+ Mellanox OFED and Mellanox HPC-X binary distributions.
677
679
678
- shell$ mpirun --mca pml ob1 ...
680
+ UCX currently supports:
681
+
682
+ - OpenFabrics Verbs (including InfiniBand and RoCE)
683
+ - Cray's uGNI
684
+ - TCP
685
+ - Shared memory
686
+ - NVIDIA CUDA drivers
687
+
688
+ While users can manually select any of the above transports at run
689
+ time, Open MPI will select a default transport as follows:
690
+
691
+ 1. If InfiniBand devices are available, use the UCX PML.
692
+
693
+ 2. If PSM, PSM2, or other tag-matching-supporting Libfabric
694
+ transport devices are available (e.g., Cray uGNI), use the "cm"
695
+ PML and a single appropriate corresponding "mtl" module.
696
+
697
+ 3. If MXM/InfiniBand devices are availble, use the "yalla" PML
698
+ (NOTE: the "yalla"/MXM PML is deprecated -- see below).
699
+
700
+ 4. Otherwise, use the ob1 PML and one or more appropriate "btl"
701
+ modules.
702
+
703
+ Users can override Open MPI's default selection algorithms and force
704
+ the use of a specific transport if desired by setting the "pml" MCA
705
+ parameter (and potentially the "btl" and/or "mtl" MCA parameters) at
706
+ run-time:
707
+
708
+ shell$ mpirun --mca pml ob1 --mca btl [comma-delimted-BTLs] ...
709
+ or
710
+ shell$ mpirun --mca pml cm --mca mtl [MTL] ...
679
711
or
680
- shell$ mpirun --mca pml cm ...
681
-
682
- - Similarly, there are two OpenSHMEM network models available: "ucx",
683
- and "ikrit":
684
- - "ucx" interfaces directly with UCX;
685
- - "ikrit" interfaces directly with Mellanox MXM.
686
-
687
- - UCX is the Unified Communication X (UCX) communication library
688
- (http://www.openucx.org/).
689
- This is an open-source project developed in collaboration between
690
- industry, laboratories, and academia to create an open-source
691
- production grade communication framework for data centric and
692
- high-performance applications.
693
- UCX currently supports:
694
- - OFA Verbs;
695
- - Cray's uGNI;
696
- - NVIDIA CUDA drivers.
697
-
698
- - MXM is the Mellanox Messaging Accelerator library utilizing a full
699
- range of IB transports to provide the following messaging services
700
- to the upper level MPI/OpenSHMEM libraries:
701
-
702
- - Usage of all available IB transports
703
- - Native RDMA support
704
- - Progress thread
705
- - Shared memory communication
706
- - Hardware-assisted reliability
712
+ shell$ mpirun --mca pml ucx ...
713
+
714
+ As alluded to above, there is actually a fourth MPI point-to-point
715
+ transport, but it is deprecated and will likely be removed in a
716
+ future Open MPI release:
717
+
718
+ - "yalla" uses the Mellanox MXM transport library. MXM is the
719
+ deprecated Mellanox Messaging Accelerator library, utilizing a
720
+ full range of IB transports to provide the following messaging
721
+ services to the upper level MPI/OpenSHMEM libraries. MXM is only
722
+ included in this release of Open MPI for backwards compatibility;
723
+ the "ucx" PML should be used insead.
724
+
725
+ - The main OpenSHMEM network model is "ucx"; it interfaces directly
726
+ with UCX.
727
+
728
+ The "ikrit" OpenSHMEM network model is also available, but is
729
+ deprecated; it uses the deprecated Mellanox Message Accelerator
730
+ (MXM) library.
731
+
732
+ - In prior versions of Open MPI, InfiniBand and RoCE support was
733
+ provided through the openib BTL and ob1 PML plugins. Starting with
734
+ Open MPI 4.0.0, InfiniBand support through the openib+ob1 plugins is
735
+ both deprecated and superseded by the ucx PML component.
736
+
737
+ While the openib BTL depended on libibverbs, the UCX PML depends on
738
+ the UCX library.
739
+
740
+ Once installed, Open MPI can be built with UCX support by adding
741
+ --with-ucx to the Open MPI configure command. Once Open MPI is
742
+ configured to use UCX, the runtime will automatically select the UCX
743
+ PML if one of the supported networks is detected (e.g., InfiniBand).
744
+ It's possible to force using UCX in the mpirun or oshrun command
745
+ lines by specifying any or all of the following mca parameters:
746
+ "--mca pml ucx" for MPI point-to-point operations, "--mca spml ucx"
747
+ for OpenSHMEM support, and "--mca osc ucx" for MPI RMA (one-sided)
748
+ operations.
749
+
750
+ - Although the ob1 PML+openib BTL is still the default for iWARP and
751
+ RoCE devices, it will reject InfiniBand defaults (by default) so
752
+ that they will use the ucx PML. If using the openib BTL is still
753
+ desired, set the following MCA parameters:
754
+
755
+ # Note that "vader" is Open MPI's shared memory BTL
756
+ $ mpirun --mca pml ob1 --mca btl openib,vader,self \
757
+ --mca btl_openib_allow_ib 1 ...
707
758
708
759
- The usnic BTL is support for Cisco's usNIC device ("userspace NIC")
709
760
on Cisco UCS servers with the Virtualized Interface Card (VIC).
@@ -756,32 +807,6 @@ Network Support
756
807
mechanisms for Open MPI to utilize single-copy semantics for shared
757
808
memory.
758
809
759
- - In prior versions of Open MPI, InfiniBand and RoCE support was
760
- provided through the openib BTL and ob1 PML plugins. Starting with
761
- Open MPI 4.0.0, InfiniBand support through the openib+ob1 plugins is
762
- both deprecated and superseded by the UCX PML component.
763
-
764
- UCX is an open-source optimized communication library which supports
765
- multiple networks, including RoCE, InfiniBand, uGNI, TCP, shared
766
- memory, and others.
767
-
768
- While the openib BTL depended on libibverbs, the UCX PML depends on
769
- the UCX library. The UCX library can be downloaded from
770
- http://www.openucx.org/ or from various Linux distribution
771
- repositories (e.g., Fedora/RedHat yum repositories). The UCX
772
- library is also part of Mellanox OFED and Mellanox HPC-X binary
773
- distributions.
774
-
775
- Once installed, Open MPI can be built with UCX support by adding
776
- --with-ucx to the Open MPI configure command. Once Open MPI is
777
- configured to use UCX, the runtime will automatically select the UCX
778
- PML if one of the supported networks is detected (e.g., InfiniBand).
779
- It's possible to force using UCX in the mpirun or oshrun command
780
- lines by specifying any or all of the following mca parameters:
781
- "-mca pml ucx" for MPI point-to-point operations, "-mca spml ucx"
782
- for OpenSHMEM support, and "-mca osc ucx" for MPI RMA (one-sided)
783
- operations.
784
-
785
810
Open MPI Extensions
786
811
-------------------
787
812
0 commit comments