The Message-Passing Interface (MPI) is a standard specification that supports coding of distributed memory parallel programs by means of message passing (point-to-point and one-sided) and collective communication operations among processes.
Before the MPI standards were created, universities, government research institutions and computer manufacturers created message passing libraries. Those libraries, however, depended on platform-specific functions, proprietary operating system functions and special calling formats. Applications that are converted into parallel-processing using those libraries have low portability and low compatibility.
The MPI message passing standards were created with the objective of retaining the useful features of existing message passing libraries and permitting the coding of portable, parallel applications that could run on various platforms without requiring modifications.
The standardization effort identified the following objectives.
The standardization effort was initiated by a group led by Dr. Dongarra in the United States at the University of Tennessee. After that, the effort was continued by the MPI Forum, which is a private organization composed of participants from American and European universities, government research institutions, corporate researchers, and computer vendors. Their efforts culminated in the publication of the first edition, MPI-1.0, in June 1994. The standard specification has subsequently been revised several times for clarification and expansion. MPI Version 2 added one-sided communication and MPI-IO, and was published in July 1994. MPI Version 3.0 added non-blocking collective operations and new one-sided functions, and was published in September 2012, followed by MPI Version 3.1, which contains mostly corrections and clarifications to MPI Version 3.0 and was published in June 2015.
1.2   Configuration of NEC MPI
NEC MPI is an implementation of MPI Version 3.1, which uses shared
memory feature of a VH, and InfiniBand functions for
communication to achieve high-performance communication.
1.2.1   Component of NEC MPI
NEC MPI consists of the following components:
NEC MPI supports a communication among processes on VE. Furthermore, by using the NEC MPI/Scalar-Vector Hybrid, you can perform a communication among processes on VH or scalar nodes and those on VE nodes. In other words, it supports heterogeneous environments. NEC MPI selects appropriate MPI communication method among VH or scalar nodes and VE nodes automatically with consideration for the system configuration and so on. In general, MPI communication will be the fastest, if InfiniBand, which NEC MPI can take direct use of, is available. For hybrid execution, it is necessary to compile and link the MPI program for a VH or scalar node, and to specify the mpirun command corresponding to hybrid execution. See Chapter 3 for details.
In the execution of MPI program using NEC MPI, if the VEO (VE Offload) is used in the MPI processes
running on VHs, those processes can directly pass the address for the VE memory to MPI communication procedures
as data buffer argument. If CUDA is used in the MPI processes running on scalar hosts, those processes can
directly pass the address for GPU device memory to MPI communication procedures as data buffer argument.
Normally, memory transfer between scalar hosts and GPU uses CUDA functions, but MPI communication can be faster by using a library called GDRCopy(*).
Please refer to the Section 3 for the details about compiling, linking, execution and notices
for use of the VEO and CUDA together with NEC MPI.
If the VEO is used with NEC MPI, VEOS Version 2.7.6 or later is required, and if
the CUDA is used together with NEC MPI, NVIDIA CUDA Toolkit 11.1/11.8 is required.
Please refer to the "The Tutorial and API Reference of Alternative VE Offloading" as well
for more details about the VEO features.
VE and GPU device configuration within a single node and communication
between those devices is planned to be supported in a future release.
(*) Please refer to https://github.com/NVIDIA/gdrcopy to create and install RPM on VHs or scalar hosts where MPI processes run.
The AVEO UserDMA feature enables direct memory transfer between VE memory in the same VH when VEO is used together with NEC MPI. This feature requires VEOS Version 2.11.0 or later. This feature is enabled by default. In some environments or applications, performance may be degraded when this feature is enabled. Please refer to 3.2.4 Environment Variables to disable this feature. When this feature is disabled or NEC MPI 2.20.0 or before is used, the data is transferred via VH memory. Please refer to 3.11 Miscellaneous about notes of this feature.
The GPUDirect RDMA feature enables direct memory transfer between GPU memory in the other hosts when CUDA is used together with NEC MPI. This feature requires NVIDIA CUDA Toolkit 11.8 and NVIDIA Peer Memory Client. Please refer to https://github.com/Mellanox/nv_peer_memory to create and install RPM on GPU hosts where MPI processes run. For the usage of the GPUDirect RDMA feature, please refer to 3.1 Compiling and Linking MPI Programs, 3.2.4 Environment Variables and 3.2.10 Execution Examples.
Contents | Previous Chapter | Next Chapter | Glossary | Index |