Alternative VE Offloading
3.0.2
|
VE Offloading framework (VEO) is a framework to provide accelerator-style programming on Vector Engine (VE).
SX-Aurora TSUBASA provides VE Offloading framework (VEO) for the accelerator programming model. The accelerator programming model executes parallelized and/or vectorized numeric code such as matrix operations on accelerators and a main code controlling accelerators and performing I/O on a host.
The Alternative VE Offloading framework (AVEO) is a faster and much lower latency replacement to the previous VEO implementation which brings multi-VE support, simultaneous debugging of VE and VH side, API extensions.
You can migrate to AVEO from the previous VEO implementation by installing the AVEO's packages and re-linking your program with AVEO without modification of makefiles.
To run programs, please install veoffload-aveo and veoffload-aveorun and the runtime packages of the compiler (2.2.2 or later).
To install the packages to run programs on the VE10, VE10E or VE20 by yum, execute the following command as root:
To install the packages to run programs on the VE30 by yum, execute the following command as root:
You need not uninstall veoffload and veoffload-veorun which are the packages of the previous VEO implementation. So, you can execute your program linked with the previous VEO implementation.
To run develop programs, veoffload-aveo-devel and veoffload-aveorun-veN-devel and the development packages of the compiler (2.2.2 or later) are also required.
To install the develop packages to run programs on the VE10, VE10E or VE20 by yum, execute the following command as root:
To install the develop packages to run programs on the VE30 by yum, execute the following command as root:
veoffload-devel and veoffload-veorun-devel will be uninstalled automatically, because they conflict with veoffload-aveo-devel and veoffload-aveorun-devel.
Then, you can link your program with AVEO. If you want to link your program with the previous VEO implementation, please install its packages into another machine.
First, let's try a "Hello, World!" program on VE.
VEO requires huge pages for data transfer. The required number of huge pages is 16 per VEO thread context by default. The number of huge pages required depends on the value of the VE_OMP_NUM_THREADS environment variable. If the environment variable VE_OMP_NUM_THREADS is not specified, VEO requires 16 pages per VE context. If the environment variable VE_OMP_NUM_THREADS is specified, VEO requires the value of VE_OMP_NUM_THREADS * 4 pages per VE context.
Code to run on VE is shown below. Standard C functions are available, hence, you can use printf(3).
Save the above code as libvehello.c.
A function on VE called via VEO needs to return a 64-bit unsigned integer. A function on VE called via VEO can have arguments as mentioned later.
VEO supports a function in an executable or in a shared library.
To execute a function on VE using VEO, compile and link a source file into a binary for VE.
To build an executable with the functions statically linked, execute as follows:
mk_veorun_static is wrapper to create VE binary with static linking for functions called from VH main program using VEO.
To build a shared library with the functions for dynamic loading, execute as follows:
Main routine on VH side to run VE program is shown here.
A program using VEO needs to include "ve_offload.h". In the header, the prototypes of VEO functions and constants for VEO API are defined.
The example VH program to call a VE function in a statically linked executable:
Save the above code as hello.c
To call a VE function in a statically linked executable:
The example VH program to call a VE function in a dynamic library with VEO:
Save the above code as hello.c
To call a VE function in a dynamic library with VEO:
Compile source code on VH side as shown below.
The headers for VEO are installed in /opt/nec/ve/veos/include; libveo, the shared library of VEO, is in /opt/nec/ve/veos/lib64.
Execute the compiled VEO program.
VE code is executed on VE node 0, specified by veo_proc_create_static()
or veo_proc_create()
.
You can pass one or more arguments to a function on VE. To specify arguments, VEO arguments object is used. A VEO argument object is created by veo_args_alloc(). When a VEO argument object is created, the VEO argument object is empty, without any arguments passed. Even if a VE function has no arguments, a VEO arguments object is still necessary.
VEO provides functions to set an argument in various types.
To pass an integer value, the following functions are used.
You can pass also a floating point number argument.
For instance: suppose that proc is a VEO process handle and func(int, double) is defined in a VE library whose handle is handle.
In this case, func(1, 2.0) is called on VE.
Non basic typed arguments and arguments by reference are put on a stack. VEO supports an argument on a stack.
To set a stack argument to a VEO arguments object, call veo_args_set_stack().
The third argument specifies the argument is for input and/or output.
You can create a VEO context which has a specified attribute.
Available attribute is 'stack size of VEO context' only.
For instance: suppose that proc is a VEO process handle.
In this case, VEO context which has a 256MB stack is created.
Code written by Fortran to run on VE is shown below.
Save the above code as libvefortran.f90.
To build an executable with the functions statically linked, execute as follows:
To build a shared library with the functions for dynamic loading, execute as follows:
Main routine on VH side to run VE program written by Fortran is shown here.
The example VH program to call a VE Fortran function in a statically linked executable:
Save the above code as fortran.c.
If you want to pass arguments to VE Fortran function, please use veo_args_set_stack() to pass arguments as stack arguments. However if you want to pass arguments to arguments with VALUE attribute in Fortran function, please pass arguments by value in the same way as VE C function.
When you want to call VE Fortran function by veo_call_async_by_name() with the name of a Fortran function, please change the name of the Fortran function to lowercase, and add "_" at the end of the function name.
Taking libvefortran.f90 and fortran.c as an example, pass "sub1_" as a argument to veo_call_async_by_name() in fortran.c when calling the Fortran function named "SUB1" in libvefortran.f90.
The method of compiling and running VH main program are same as C program.
Compile source code on VH side as shown below. This is the same as the compilation method described above.
Execute the compiled VEO program. This is also the same as the execution method described above.
The following is an example of VE code using OpenMP written in C.
Save the above code in libomphello.c.
The following shows the example written in Fortran.
Save the above code in libompfortran.f90.
To use OpenMP parallelization, specify -fopenmp at compilation and linking.
Following is an example of building VE code written in C.
To build a static-linked binary, execute as follows:
To build a shared library, execute as follows:
Following is an example of building VE code written in Fortran.
To build a static-linked binary, execute as follows:
To build a shared library, execute as follows:
Examples of VH main code that calls VE code provided above are saved as omphello.c, omphello_static.c, ompfortran.c and ompfortran_static.c.
To compile or execute the program, Please refer to "Compile VH main program" subsection of "Hello World" section in Getting Started with VE.
To execute the program, please refer to "Run a program with VEO" subsection of "Hello World" sction in the Getting Started with VEO.
To compile a MPI program witch calls VEO API using mpincc with '-lveo' option.
To execute a MPI program which calls VEO API using mpiexec or mpirun with '-veo' option.
Please refer to MPI document NEC MPI User's Guide for mpiexec, mpirun, mpincc and MPI API details.
In order to transfer data on VE memory using MPI API, it is necessary to allocate VE memory using veo_alloc_hmem().
The example MPI program which calls VEO API.
You can compile it with following commands.
You can run it with following commands.
To generate a ftrace.out, specify "-ftrace" option at compilation and linking VE code. A ftrace.out is generated on invocation of veo_proc_destroy() from VH main program.
Please refer to PROGINF/FTRACE User's Guide for more details of FTRACE.
Here is an example of building VE code wiritten in C using or not using OpenMP.
To generate ftrace.out with a different file name for each VE process, please install nec-veperf packages (later 2.3.0) and veoffload-aveo and veoffload-aveorun (2.13.0 or later).
AVEO (2.13.0 or later) sets the default performance information file name "ftrace.out" and the suffix for AVEO "veo" in the environment variable VE_FTRACE_OUT_NAME.
When the VE process linked with the ftrace library is executed using AVEO, performance information file is as follows.
${XXX} represents the value of the XXX environment variable. For example, ${MPIRANK} represents the value of MPIRANK environment variable. $$ represents VE process PID to which the ftrace library linked.
To build a static-linked binary without OpenMP for ftrace, execute as follows:
To build a shared library for ftrace, execute as follows:
To build code written in Fortran, change the compiler to nfort.
To build a static-linked binary with OpenMP for ftrace, execute as follows:
To build a shared library for ftrace, execute as follows:
To build code written in Fortran, change the compiler to nfort.
veorun is the helper program required for the dynamic loading of shared libraries. In order to take advantage of new features or bug fixes provided by updated compilers for VE, you will need to relink veorun for those compilers.
To link veorun which can loads shared library using OpenMP written in Fortran, execute as follows.
If you need to generate ftrace.out file, please add "-ftrace" option to mk_veorun_static.
To use the newly created veorun, set the environment variable VEORUN_BIN.
And execute a VEO program.
Set the environment variable VEO_LOG_DEBUG to some value and execute a VEO program. The log is output as standard output.
You can use traceback information on VE program.
To build an executable with the functions statically linked, execute as follows:
To build a shared library with the functions for dynamic loading, execute as follows:
Execute the compiled VEO program. Set the environment variable VE_TRACEBACK=VERBOSE to output these information at run-time. In example, the exception of "Divide by zero" occurred in line 5 of libvehello.c.
'hello' that is executed with VE_TRACEBACK in example is VH main program to run 'vehello' or 'libvehello.so'. The example of VH main program are hello.c and hello_static.c. For more details of VH main program, please refer to "Hello World" in "Getting Started with VEO".
AVEO(2.8.2 or later) supports the environment variable to debug VE processes.
When you want to debug a VH process, execute the following command.
When you want to debug a single VE process, set the environment variable as below.
Then, execute your program. Whenever a VE process starts, a VE debugger gdb is shown in your console. Type "run" to continue, or set breakpoints, etc.
This works under the premise that you don't need to interact with the host side program (e.g. it requires no input).
If you spawn multiple VE processes with the environment variable VEO_DEBUG=console set, it leads to undefined behaivior.
When X Window System is available and you want to debug multiple VE processes, set the environment variable as below.
Then, execute your program. Whenever multiple VE processes start, xterm windows will open up and show VE debugger prompts. Type "run" to continue in each of these windows.
This works under the premise that the environmet DISPLAY is set up properly.
AVEO(2.7.5 or later) supports the environment variables to optimize the performance of data transfer.
Environment variables | Brief | Default value |
---|---|---|
VEO_SENDFRAG | The maximum size of fragments of data to write (send). | 4MiB |
VEO_SENDCUT | The threshold to cut data into fragments to write (send). | 2MiB |
VEO_RECVFRAG | The maximum size of fragments of data to read (receive). | 4MiB |
VEO_RECVCUT | The threshold to cut data into fragments to read (receive). | 2MiB |
AVEO cuts data into the fragments to write (send) as described in the below table:
Data size[MiB] | Fragment size[MiB] |
---|---|
>= VEO_SENDFRAG * 5 | VEO_SENDFRAG |
> VEO_SENDCUT | Data size / 4 |
> VEO_SENDCUT / 2 | Data size / 3 |
> VEO_SENDCUT / 4 | Data size / 2 |
> 0 | Data size |
AVEO cuts data into the fragments to read (receive) as described in the below table:
Data size[MiB] | Fragment size[MiB] |
---|---|
>= VEO_RECVFRAG * 5 | VEO_RECVFRAG |
> VEO_RECVCUT | Data size / 5 |
> VEO_RECVCUT / 2 | Data size / 4 |
> VEO_RECVCUT / 4 | Data size / 2 |
> 0 | Data size |