=============================================================================== Version 2.27.0 (for MLNX_OFED 4.x) / Version 3.6.0 (for MLNX_OFED 5.x or later) =============================================================================== * Release date: February.29.2024 * Corresponding Manuals - The revision 27th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. Improved the fault detection of the InfiniBand network and ensured termination of MPI execution. 2. Improved MPI compile commands to issue warnings for the use of the unrecommended -static option. [Fixed Problems] 1. Fixed the problem that MPI reduction procedures with user-defined functions and non-contiguous datatypes may cause the incorrect result. 2. Fixed the problem that "Average CPU Cores Use" of MPI runtime performance information (NMPI_PROGINF) may be larger than "Available CPU Cores" of that in very short MPI programs. 3. Fixed the problem that mpirun and mpiexec commands accept the -ve option that does not have a required argument and interpret that VE node number 0 be selected even in the NQSV environment. 4. Fixed the problem that multi-node non-blocking MPI-IO for files on NFS may cause the incorrect result. 5. Fixed the problem that the incorrect result may occur in the InfiniBand communication between the processes which use same IB-HCA in the machines except for model A412-8, B401-8 and C401-8. 6. Fixed the problem that the abnormal termination may occur when the value which is not a multiple of 8 is specified to NMPI_IB_VH_MEMCPY_THRESHOLD. 7. Fixed the problem that the following message may be displayed in MPI_Finalize() when VH-VE hybrid is used and InfiniBand is not available. "finalize: free units after init = xxxxxx now = xxxxxx" 8. Fixed the problem that the incorrect result may occur when MPI_Put() is called with the size which is not a multiple of 8 and VH-VE hybrid is used. 9. Fixed the problem that the abnormal termination or the incorrect result may occur when collective MPI-IO is used and all of following conditions are met. - The offset of the file is not ascending order by the rank number. - The access range of a rank is adjacent or overlapping with two or more access ranges of other ranks. =============================================================================== Version 2.26.0 (for MLNX_OFED 4.x) / Version 3.5.0 (for MLNX_OFED 5.x or later) =============================================================================== * Release date: September.28.2023 * Corresponding Manuals - The revision 26th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. Added a feature to enable NMPI_IB_CONNECT_IN_INIT automatically. [Fixed Problems] 1. Fixed the problem that MPI execution on three or more nodes may not start. 2. Fixed the problem that the inter-host communication between CUDA and non-CUDA may hang when the transfer size is 64MiB or more. 3. Fixed the problem that the intra-host communication between VE and VH may hang when InfiniBand is not available. 4. Fixed the problem that setting NMPI_IB_CONNECT_IN_INIT=ON causes abnormal termination when InfiniBand is not available. 5. Fixed the problem that MPI_File_write_all may stall. =============================================================================== Version 2.25.0 (for MLNX_OFED 4.x) / Version 3.4.0 (for MLNX_OFED 5.x) =============================================================================== * Release date: June.29.2023 * Corresponding Manuals - The revision 25th of NEC MPI User's Guide (G2AM01E) [Fixed Problems] 1. Fixed the problem that the abnormal termination may occur when inter-host communication is issued from CUDA to VE and VH memory copy is used on the VE side. 2. Fixed the problem that the incorrect result may occur when small size communication is issued on CUDA and the gdrcopy feature is enabled. 3. Fixed the problem that the "Power Throttling (sec)" of MPI runtime performance information (NMPI_PROGINF) may be counted incorrectly on VE30. 4. Fixed the problem that the NEC MPI's compile commands, such as mpinfort, cannot handle correctly some special characters ";", "&", "|", and "`" in command line options. 5. Fixed the problem that the abnormal termination with "Unable to grow stack" may occur in the VE3 program executing division of long double type, which is linked specifying -static and explicit -lm option using NEC MPI Version 3.3.0. =============================================================================== Version 2.24.0 (for MLNX_OFED 4.x) / Version 3.3.0 (for MLNX_OFED 5.x) =============================================================================== * Release date: March.30.2023 * Corresponding Manuals - The revision 24th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. Supported VE30. * Supported MPI predefined datatypes for half-precision floating point number of VE30. * MPI runtime performance information (NMPI_PROGINF) supports the following performance items on VE30. - LD L3 Hit Element Ratio (%) The ratio of the elements loaded from L3 cache to the total elements loaded by load instructions - Actual Load B/F B/F calculated from bytes of actual memory access by load instructions 2. Supported GPUDirect RDMA feature. [Fixed Problems] 1. Fixed the problem that the memory leak occurs when VEO is used and inter-host communication is issued. 2. Fixed the problem that the abnormal termination may occur when MPI_Win_free is called. 3. Fixed the problem that the output of the program may be corrupted when MPI_Abort is called. 4. Fixed the problem that the NEC MPI's compile commands, such as mpinfort, cannot handle correctly some special characters "(", ")", "<", and ">" in command line options. 5. Fixed the problem that NMPI_IP_USAGE does not work properly. =============================================================================== Version 2.23.0 (for MLNX_OFED 4.x) / Version 3.2.0 (for MLNX_OFED 5.x) =============================================================================== * Release date: January.6.2023 * Corresponding Manuals - The revision 23rd of NEC MPI User's Guide (G2AM01E) [New Feature] 1. Enhanced support for PPS (Partial Process Swapping) that when using PPS, Non-Swappable Memory region of MPI processes running on VE is now more reduced. =============================================================================== Version 2.22.0 (for MLNX_OFED 4.x) / Version 3.1.0 (for MLNX_OFED 5.x) =============================================================================== * Release date: November.30.2022 * Corresponding Manuals - The revision 22nd of NEC MPI User's Guide (G2AM01E) [New Feature] 1. Added a feature to free up CPU resource usage when VH processes wait for communication completion. 2. Added a feature to control timing of establishing InfiniBand connection. The performance of first collective communication may be improved. 3. Improved the performance when InfiniBand Adaptive Routing is used. 4. Improved the operation to reclaim hardware resources of VE processes. The frequency of program aborting due to the lack of hardware resouces may be reduced. 5. Added the SHARP support on MLNX_OFED 5.6 environment. 6. Improved the messages in case that an error "failed to attach iolocks shared memory: Identifier removed" is detected within MPI initialization (MPI_INIT) in MPI processes running on VH or scalar hosts. 7. Added a guard function when the MPI library version is higher than the MPI daemon. 8. Improved GPU-to-GPU and GPU-to-host latency performance within a node. 9. VEDA is supported by AVEO UserDMA feature. 10. Supported NVIDIA HPC Compilers. [Fixed Problems] 1. Fixed the problem that the program may hang when many processes issue MPI_Put() concurrently. 2. Fixed the problem that the abnormal termination will occur when MPI_Get_accumulate() is called with non-contiguous datatype and the transfer size is not small. 3. Fixed the problem that the abnormal termination will occur when MPI_Scatterv/MPI_Iscatterv/MPI_Gatherv/MPI_Igatherv/MPI_Iscan is called and AVEO/CUDA is used. 4. Fixed the problem that the MPI_TAG_UB attribute value is not set in a newly created communicator. 5. Fixed backward compatibility issue with MPI daemon library. 6. Fixed the problem that a link error sometimes occurs when linking NEC MPI libraries for VH in RHEL 7 environment. 7. Fixed the problem that the abnormal termination may occur when MPI_Put is called and VH-VE Hybrid is used. 8. Fix the problem of unexpected wildcard expansion in the compile shell scripts such as mpincc. =============================================================================== Version 2.21.0 (for MLNX_OFED 4.x) / Version 3.0.0 (for MLNX_OFED 5.x) =============================================================================== * Release date: March.31.2022 * Corresponding Manuals - The revision 21st of NEC MPI User's Guide (G2AM01E) [New Feature] 1. MLNX_OFED 5.x is supported in NEC MPI Version 3.0.0. (See note below) 2. Supported AVEO UserDMA feature. Transfers between VE memory in the same VH will be improved when VEO is used together with NEC MPI. 3. GNU Compiler Collection Version 8.5.0 is supported. 4. Enhanced to be able to search the executable file specified in the MPI execution command in the directory specified in the PATH environment variable. (When the environment variable NMPI_USE_COMMAND_SEARCH_PATH is enabled) 5. When the NEC MPI process manager in the queue setting is mpd when executing an NQSV batch job, the execution time information can be output for each MPI execution command that specifies -v in the job script. (When the environment variable NMPI_OUTPUT_RUNTIMEINFO is enabled) 6. Cleanup of shared memory segments unexpectedly not removed is improved (for MPI processes running on VHs) [Fixed Problems] 1. Fixed an issue where the exit status of the MPI execution command may not be the same as the exit status of the first abnormally terminated process when the MPI program execution terminates abnormally. 2. Fixed the problem that an invalid error number is output as an error message when the function gethostbyname called inside the MPI library causes an error. [Note] Some communication features are incompatible between MLNX_OFED 4.x and MLNX_OFED 5.x. Therefore, there are the following notes. * MPI executables linked with the NEC MPI 2.x.x library for MLNX_OFED 4.x will only work in the MLNX_OFED 4.x environment. * MPI executables linked with the NEC MPI 3.x.x library for MLNX_OFED 5.x will only work in the MLNX_OFED 5.x environment. Since Aurora SW supports MLNX_OFED 5.x on RHEL 8.5, MPI executable files linked with the NEC MPI 2.x.x library cannot be run in the environment. In order to run in the RHEL 8.5 / MLNX_OFED 5.x environment, it is necessary to relink the NEC MPI 3.x.x library. Please be aware of this when updating to RHEL 8.5. =============================================================================== Version 2.20.0 =============================================================================== * Release date: December.24.2021 * Corresponding Manuals - The revision 20th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. Enhanced support for PPS (Partial Process Swapping). This enhancement can reduce Non-Swappable Memory region in MPI process running on VE, if "swap_on_hold 1" is described in MPI configuration file necmpi.conf or the environment variable NMPI_SWAP_ON_HOLD=YES is set at runtime. Note that it is only effective on the model A412-8 and B401-8. 2. Supported VE AIO as an asynchronous I/O method used in non-blocking MPI-IO procedures in VE MPI programs. 3. Supported the use of mallinfo2 function in MPI programs 4. Supported GNU Compiler Collection Version 8.4.0 and 8.4.1 5. Enhanced the function to guard execution under NQSV if the runtime options NMPI_EXEC_LNODE, NMPI_LNODEON, MPILNODEON for the interactive execution is used. [Fixed Problems] 1. Fixed the compile and link failure when using fortran binding * PMPI_File_Iwrite_at_all procedure fails to link * When using gfortran-4.8.5, the following use statement fails to compile - use :: mpi, only : MPI_Ibarrier - use :: mpi, only : PMPI_Ibarrier * When using mpi_f08 module, the program the following are used may fail to compile - MPI_Grequest_complete - PMPI_Grequest_complete - MPI_Grequest_free_function - MPI_User_function 2. The abnormal termination may occur when MPI_Recv_init is used and MPI_Start/MPI_Startall are called many times. =============================================================================== Version 2.19.0 =============================================================================== * Release date: October.29.2021 * Corresponding Manuals - The revision 19th of NEC MPI User's Guide (G2AM01E) [Fixed Problems] 1. Fixed the problem that it may terminate abnormally when executing MPI-IO using a derived data type generated by MPI_Type_dup. 2. Fixed the problem that may terminate the execution of MPI program abnormally very occasionally with the error "MPID_iolocks_shmget: shmget: File exists", if the execution includes scalar processes, such as VH-VE hybrid execution. 3. Fixed the problem that -MT/-MQ options cannot be specified to mpincc/mpinc++. [Performance Improvement] 1. MPI communication is improved when the number of processes is more than 4096. =============================================================================== Version 2.18.0 =============================================================================== * Release date: July.29.2021 * Corresponding Manuals - The revision 19th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. HYDRA as process manager in batch job execution under NQSV is supported. The following functions are provided in the condition that HYDRA process manager is selected. * Multiple versions of the MPI execution commands are available. * The output of the MPI program is set as the standard output and standard error output of the MPI execution command. (if the environment variable NMPI_OUTPUT_COLLECT is ON) [Fixed Problems] 1. Fix to possible unavailable SHARP feature in a large scale execution. 2. Fix to the abnormal termination problem that may happen, if MPI_Type_get_contents is executed with an argument of a derived datatype created by MPI_Type_dup. 3. Fix to the problem that "cannot find fifo file" error or execution stalled may be occurred in the MPI program execution when partial process swapping is used. (The selection of HYDRA as NEC MPI process manager is needed). =============================================================================== Version 2.17.0 =============================================================================== * Release date: May.31.2021 * Corresponding Manuals - The revision 18th of NEC MPI User's Guide (G2AM01E) [Fixed Problems] 1. Fix to incorrect result of MPI_Bcast when over 256 VHs are used and the specified root rank is not 0 2. Fixed a bug in Point-to-Point communication and collective communication between VE node and GPU node 3. Fixed the problem that interfaces of MPI procedures having choice dummy arguments are not included in the MPI module for Intel Fortran Compiler 4. Fix to possible abnormal termination of InfiniBand communication when the value specified by the environment variable NMPI_IB_VH_MEMCPY_THRESHOLD is smaller than the value specified by the environment variable NMPI_IB_VH_MEMCPY_SPLIT_THRESHOLD =============================================================================== Version 2.16.0 =============================================================================== * Release date: May.10.2021 * Corresponding Manuals - The revision 17th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. VE-GPU internode communication support 2. Performance improvement of VH-VE hybrid MPI for IB-less system 3. PBS Professional support [Fixed Problems] 1. Fix to possible abnormal termination of memory allocation, such as malloc, in multi-threaded MPI program execution with the following error message pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed. 2. Fix to performance degradation in some MPI communications, if MPI program linked with NEC MPI library v2.11.0, 2.12.0, 2.13.0, or 2.14.0 is executed with NEC MPI runtime v2.15.0 3. Fix to possible abnormal termination in case of different parameters set to max_mtu and active_mtu in an InfiniBand HCA 4. Fix to abnormal termination or program stall, if MPI program using MPI_Alltoall is running on multiple nodes of Model A412-8 or B401-8 =============================================================================== Version 2.15.0 =============================================================================== * Release date: Mar.31.2021 * Corresponding Manuals - The revision 16th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. Enhanced cooperation with VEO 2. Enhanced traceback information 3. Enhanced MPI rank information [Fixed Problems] 1. Fixed a problem that MPI compile commands may treat a character string containing $ as a shell variable erroneously. =============================================================================== Version 2.14.0 =============================================================================== * Release date: Feb.24.2021 * Corresponding Manuals - The revision 15th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. NEC MPI's configuration file necmpi.conf supports an option "ib_ar" to tell the usage of InfiniBand Adaptive Routing. [Fixed Problems] 1. Fix to the possible abnormal termination in MPI_FINALIZE for the multi-threaded application. 2. Fix to the possible wrong messages for the SHARP usage in the VH/VE hybrid execution. =============================================================================== Version 2.13.0 =============================================================================== * Release date: Dec.25.2020 * Corresponding Manuals - The revision 15th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. The license check for MPI runtime is disabled. 2. MPI program runtime resource check improved, so that, if InfiniBand HCA is not available or enough Huge Pages cannot be allocated, the execution is terminated. [Fixed Problems] 1. The execution may be stalled rarely in MPI_Init. =============================================================================== Version 2.12.0 =============================================================================== * Release date: Nov.30.2020 * Corresponding Manuals - The revision 14th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. MPI communication using Ethernet is supported. 2. MPI executon on Singularity is supported. [Fixed Problems] 1. MPI_Get in NEC MPI-2.11.0 rarely causes DMA Exception. 2. The incompatibility of MPI_CHARACTER in NEC MPI-2.11.0 is reverted. 3. Abnormal termination may occur in MPI_Type_create_struct when count argument is set to 0 or array_of_blocklengths argument is set to all 0. [Note] If you compile object files and applications with NEC MPI-2.11.0, please recompile them with NEC MPI-2.12.0 or later. =============================================================================== Version 2.11.0 =============================================================================== * Release date: Oct.30.2020 * Corresponding Manuals - The revision 13th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. MPI runtime performance information (NMPI_PROGINF) is improved as follows. - The following performance item is added. * Non Swappable Memory Size Used The peak usage of memory that cannot be swapped out by Partial Process Swapping function - Output of the following aggregated performance items in VE card is added. * Memory Size Used * Non Swappable Memory Size Used - The location information of VE card where MPI process is executed is added. 2. Improve the overlap function for Isend/Irecv between VEs within one node. - When all of the following conditions are met, computation and communication can be overlapped in point-to-point non-blocking communication between VEs within one node. * The environment variable NMPI_DMA_RNDV_OVERLAP is set to ON. * It transfers contiguous data using basic datatype. * Transfer length is 200KB or more. * Isend is executed before Irecv in a pair of Isend and Irecv. It depends on an application program that the overlap between calculation and communication and the effect of improving execution performance by this function. If this function is enabled, the data transfer performance may decreases due to the processing characteristics of asynchronous data transfer. [Fixed Problems] 1. MPI_Request_get_status ignores the request of the following communications. - non-blocking collective - non-blocking collective IO - request-based RMA 2. MPI runtime performance information (NMPI_PROGINF) and MPI communication information (NMPI_COMMINF) are stalled when MPI_Comm_spawn or MPI_Comm_spawn_multiple is used. 3. The abnormal termination may occur in MPI communication when the derived datatype with MPI_Type_create_resized is used. 4. MPI daemon may be terminated abnormally if some mpirun commands are executed in the batch job script for NQSV. 5. MPI processes may remain as defunct processes if some mpirun commands are executed in the batch job script for NQSV. 6. When multiple NQSV requests are assigned to a single VH at the same time, MPI communication between VEs within the VH fails very rarely. =============================================================================== Version 2.10.0 =============================================================================== * Release date: Aug.31.2020 * Corresponding Manuals - The revision 12th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. SHARP-2.0.0 supported 2. The following compilers for Intel Parallel Studio XE 2020 Update 2 supported - Intel C++ Compiler 19.1.2.254 - Intel Fortran Compiler 19.1.2.254 [Fixed Problems] 1. Fix to possible incorrect VE node number returned by MPI_Get_processor_name. ============================================================================== Version 2.9.0 =============================================================================== * Release date: Jul.31.2020 * Corresponding Manuals - The revision 11th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. Regarding the automatic selection of communication type between logical nodes in the execution under NQSV, it is possible to select the communication type automatically based on the physical node by the environment variable NMPI_COMM_PNODE. ============================================================================== Version 2.8.0 ============================================================================== * Release date: Jun.30.2020 * Corresponding Manuals - The revision 10th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. It is possible to define, for the Model A412-8, the binding of InfiniBand HCA to be used for MPI communication for each VE in the configuration file of MPI. [Fixed Problems] 1. Fix to possible incorrect results in MPI communication information (NMPI_COMMINF). The number of MPI calls and the amount of MPI communication provide the correct information in this release for the following items. - Incorrect number of MPI calls fixed. * Put count * Get count * Accumulate count - Incorrect amount of MPI communication fixed. * Number of bytes put * Number of bytes got * Number of bytes accum ============================================================================== Version 2.7.0 ============================================================================== * Release date: May.29.2020 * Corresponding Manuals - The revision 9th of NEC MPI User's Guide (G2AM01E) [New Feature] 1. Infiniband Adaptive Routing is supported. 2. Scalar-Vector hybrid execution in VH without InfiniBand is supported. 3. GNU Compiler Collection Version 8.3.0 and 8.3.1 are supported. 4. The datarep argument of MPI_File_set_view can accept UPPERCASE 'NATIVE' string. [Fixed Problems] 1. Fix the problem that MPI_Allgatherv rarely aborts with Segmentation Violation on MPI process of vector host or scalar host at Scalar-Vector Hybrid execution. 2. Fix the problem that floating point exceptions of "Invalid Operation" or "Division by Zero" occurs in MPI_Finalize when MPI execution contains MPI processes of vector host or scalar host and NMPI_PROGINF is used. 3. Fix the problem that aligned_alloc returns NULL when the size argument is not integer multiple of the alignment argument. 4. Fix the problem that the permission of a file created by MPI process is different form the permission which a user sets using umask command. 5. Fix the problem that MPI C/C++ Program cannot use MPI_Aint_add and MPI_Aint_diff. [Performance Improvement] 1. MPI communication is optimized for Model "A412-8". ============================================================================== Version 2.5.0 ============================================================================== * Release date: Jan.31.2020 * Corresponding Manuals - The revision 8th of NEC MPI User's Guide (G2AM01E) [Performance Improvement] 1. MPI_Put is enhanced for Model "A412-8". ==============================================================================