Chapter 1

Overview of MPI

1.1 Introduction

The Message-Passing Interface (MPI) is a standard specification that supports coding of distributed memory parallel programs by means of message passing (point-to-point and one-sided) and collective communication operations among processes.

Before the MPI standards were created, universities, government research institutions and computer manufacturers created message passing libraries. Those libraries, however, depended on platform-specific functions, proprietary operating system functions and special calling formats. Applications that are converted into parallel-processing using those libraries have low portability and low compatibility.

The MPI message passing standards were created with the objective of retaining the useful features of existing message passing libraries and permitting the coding of portable, parallel applications that could run on various platforms without requiring modifications.

The standardization effort identified the following objectives.

Designing a calling interface that can be used from application programs
Implementing effective communication, such as using a communication coprocessor if available, and developing specifications that do not impair the use of available communication processors
Permitting an implementation that can be used in an environment in which different platforms are connected
Designing a format that permits calling from Fortran, C and C++ languages
Offering a reliable communication interface
Handling low-level communication errors by a low-level communication system so that they are transparent to the user.
Offering a communication interface that has much similarity with existing communication library interfaces, thus providing a highly flexible interface with extended functions
Defining an interface that can be implemented on many platforms and that does not require extensive modifications of the communication software and the system software upon which the interface is constructed
Providing interface functional definitions that are language-independent
Designing reentrant interfaces and enabling thread safety

The standardization effort was initiated by a group led by Dr. Dongarra in the United States at the University of Tennessee. After that, the effort was continued by the MPI Forum, which is a private organization composed of participants from American and European universities, government research institutions, corporate researchers, and computer vendors. Their efforts culminated in the publication of the first edition, MPI-1.0, in June 1994. The standard specification has subsequently been revised several times for clarification and expansion. MPI Version 2 added one-sided communication and MPI-IO, and was published in July 1994. MPI Version 3.0 added non-blocking collective operations and new one-sided functions, and was published in September 2012, followed by MPI Version 3.1, which contains mostly corrections and clarifications to MPI Version 3.0 and was published in June 2015.

1.2 Configuration of NEC MPI

NEC MPI is an implementation of MPI Version 3.1, which uses shared memory feature of a VH, and InfiniBand functions for communication to achieve high-performance communication.

1.2.1 Component of NEC MPI

NEC MPI consists of the following components:

MPI include files
The include files used in programs to use MPI.
MPI module
The module used in Fortran programs to use MPI.
MPI compilation commands
The commands to compile and link MPI programs. Please refer to this section for details.
MPI library
MPI library is linked with executable MPI executable files. However, users need not to be aware of its path by using the MPI compilation commands.
MPI execution commands (mpirun and mpiexec)
The commands to launch MPI executable files. Please refer to this section for details.
MPI daemon
Users do not need to manipulate the MPI daemon directly. MPI daemons, which are automatically activated when an executable MPI program is started, generate and manage MPI processes. The execution of MPI daemons terminates as soon as the MPI program terminates.

The Fortran compiler (nfort), C compiler (ncc), or C++ compiler (nc++) is required in order to compile and link MPI programs.

1.2.2 NEC MPI/Scalar-Vector Hybrid

NEC MPI supports a communication among processes on VE. Furthermore, by using the NEC MPI/Scalar-Vector Hybrid, you can perform a communication among processes on VH or scalar nodes and those on VE nodes. In other words, it supports heterogeneous environments. NEC MPI selects appropriate MPI communication method among VH or scalar nodes and VE nodes automatically with consideration for the system configuration and so on. In general, MPI communication will be the fastest, if InfiniBand, which NEC MPI can take direct use of, is available. For hybrid execution, it is necessary to compile and link the MPI program for a VH or scalar node, and to specify the mpirun command corresponding to hybrid execution. See Chapter 3 for details.

1.2.3 VEO (VE Offloading) and CUDA support

In the execution of MPI program using NEC MPI, if the VEO (VE Offload) is used in the MPI processes running on VHs, those processes can directly pass the address for the VE memory to MPI communication procedures as data buffer argument. If CUDA is used in the MPI processes running on scalar hosts, those processes can directly pass the address for GPU device memory to MPI communication procedures as data buffer argument. Normally, memory transfer between scalar hosts and GPU uses CUDA functions, but MPI communication can be faster by using a library called GDRCopy(*). Please refer to the Section 3 for the details about compiling, linking, execution and notices for use of the VEO and CUDA together with NEC MPI. If the VEO is used with NEC MPI, VEOS Version 2.7.6 or later is required, and if the CUDA is used together with NEC MPI, NVIDIA CUDA Toolkit 11.1/11.8 is required. Please refer to the "The Tutorial and API Reference of Alternative VE Offloading" as well for more details about the VEO features.
VE and GPU device configuration within a single node and communication between those devices is planned to be supported in a future release.

(*) Please refer to https://github.com/NVIDIA/gdrcopy to create and install RPM on VHs or scalar hosts where MPI processes run.

1.2.3.1 AVEO UserDMA feature

The AVEO UserDMA feature enables direct memory transfer between VE memory in the same VH when VEO is used together with NEC MPI. This feature requires VEOS Version 2.11.0 or later. This feature is enabled by default. In some environments or applications, performance may be degraded when this feature is enabled. Please refer to 3.2.4 Environment Variables to disable this feature. When this feature is disabled or NEC MPI 2.20.0 or before is used, the data is transferred via VH memory. Please refer to 3.11 Miscellaneous about notes of this feature.

1.2.3.2 GPUDirect RDMA feature

The GPUDirect RDMA feature enables direct memory transfer between GPU memory in the other hosts when CUDA is used together with NEC MPI. This feature requires NVIDIA CUDA Toolkit 11.8 and NVIDIA Peer Memory Client. Please refer to https://github.com/Mellanox/nv_peer_memory to create and install RPM on GPU hosts where MPI processes run. For the usage of the GPUDirect RDMA feature, please refer to 3.1 Compiling and Linking MPI Programs, 3.2.4 Environment Variables and 3.2.10 Execution Examples.

Contents

Previous Chapter

Next Chapter

Glossary

Index