VE Offloading framework (VEO) is a framework to provide accelerator-style programming on Vector Engine (VE).

Introduction

SX-Aurora TSUBASA provides VE Offloading framework (VEO) for the accelerator programming model. The accelerator programming model executes parallelized and/or vectorized numeric code such as matrix operations on accelerators and a main code controlling accelerators and performing I/O on a host.

The Alternative VE Offloading framework (AVEO) is a faster and much lower latency replacement to the previous VEO implementation which brings multi-VE support, simultaneous debugging of VE and VH side, API extensions.

You can migrate to AVEO from the previous VEO implementation by installing the AVEO's packages and re-linking your program with AVEO without modification of makefiles.

Installation

Installing runtime package

To run programs, please install veoffload-aveo and veoffload-aveorun and the runtime packages of the compiler (2.2.2 or later).

To install the packages to run programs on the VE10, VE10E or VE20 by yum, execute the following command as root:

# yum install veoffload-aveo veoffload-aveorun-ve1

To install the packages to run programs on the VE30 by yum, execute the following command as root:

# yum install veoffload-aveo veoffload-aveorun-ve3

You need not uninstall veoffload and veoffload-veorun which are the packages of the previous VEO implementation. So, you can execute your program linked with the previous VEO implementation.

Installing development package

To run develop programs, veoffload-aveo-devel and veoffload-aveorun-veN-devel and the development packages of the compiler (2.2.2 or later) are also required.

To install the develop packages to run programs on the VE10, VE10E or VE20 by yum, execute the following command as root:

# yum install veoffload-aveo-devel veoffload-aveorun-ve1-devel

To install the develop packages to run programs on the VE30 by yum, execute the following command as root:

# yum install veoffload-aveo-devel veoffload-aveorun-ve3-devel

veoffload-devel and veoffload-veorun-devel will be uninstalled automatically, because they conflict with veoffload-aveo-devel and veoffload-aveorun-devel.

Then, you can link your program with AVEO. If you want to link your program with the previous VEO implementation, please install its packages into another machine.

Hello World

First, let's try a "Hello, World!" program on VE.

The required number of huge pages for VEO

VEO requires huge pages for data transfer. The required number of huge pages is 16 per VEO thread context by default. The number of huge pages required depends on the value of the VE_OMP_NUM_THREADS environment variable. If the environment variable VE_OMP_NUM_THREADS is not specified, VEO requires 16 pages per VE context. If the environment variable VE_OMP_NUM_THREADS is specified, VEO requires the value of VE_OMP_NUM_THREADS * 4 pages per VE context.

VE code

Code to run on VE is shown below. Standard C functions are available, hence, you can use printf(3).

#include <stdio.h>
#include <stdint.h>
uint64_t hello()
{
  printf("Hello, world\n");
  return 0;
}

Save the above code as libvehello.c.

A function on VE called via VEO needs to return a 64-bit unsigned integer. A function on VE called via VEO can have arguments as mentioned later.

Compile VE code

VEO supports a function in an executable or in a shared library.

To execute a function on VE using VEO, compile and link a source file into a binary for VE.

To build an executable with the functions statically linked, execute as follows:

$ /opt/nec/ve/bin/ncc -c -o libvehello.o libvehello.c

$ /opt/nec/ve/bin/mk_veorun_static -o vehello libvehello.o

mk_veorun_static is wrapper to create VE binary with static linking for functions called from VH main program using VEO.

To build a shared library with the functions for dynamic loading, execute as follows:

$ /opt/nec/ve/bin/ncc -fpic -shared -o libvehello.so libvehello.c

VH main program

Main routine on VH side to run VE program is shown here.

A program using VEO needs to include "ve_offload.h". In the header, the prototypes of VEO functions and constants for VEO API are defined.

The example VH program to call a VE function in a statically linked executable:

#include <ve_offload.h>
int main()
{
  /* Load "vehello" on VE node 0 */
  struct veo_proc_handle *proc = veo_proc_create_static(0, "./vehello");
  uint64_t handle = 0UL;/* find a function in the executable */
  struct veo_thr_ctxt *ctx = veo_context_open(proc);
  struct veo_args *argp = veo_args_alloc();
  uint64_t id = veo_call_async_by_name(ctx, handle, "hello", argp);
  uint64_t retval;
  veo_call_wait_result(ctx, id, &retval);
  veo_args_free(argp);
  veo_context_close(ctx);
  veo_proc_destroy(proc);
  return 0;
}

Save the above code as hello.c

To call a VE function in a statically linked executable:

Create a process on a VE node by veo_proc_create_static(). Specify VE node number and an executable to run on the VE. A VEO process handle is returned.
Create a VEO context, a thread in a VE process specified by a VEO process handle to execute a VE function, by veo_context_open().
Create a VEO arguments object by veo_args_alloc() and set arguments. See the next chapter "Various Arguments for a VE function" in detail.
Call a VE function by veo_call_async_by_name() with a symbol of a function or a variale and a VEO arguments object. A request ID is returned.
Wait for the completion and get the return value by veo_call_wait_result().

The example VH program to call a VE function in a dynamic library with VEO:

#include <ve_offload.h>
int main()
{
  struct veo_proc_handle *proc = veo_proc_create(0);
  uint64_t handle = veo_load_library(proc, "./libvehello.so");
  struct veo_thr_ctxt *ctx = veo_context_open(proc);
  struct veo_args *argp = veo_args_alloc();
  uint64_t id = veo_call_async_by_name(ctx, handle, "hello", argp);
  uint64_t retval;
  veo_call_wait_result(ctx, id, &retval);
  veo_args_free(argp);
  veo_context_close(ctx);
  veo_proc_destroy(proc);
  return 0;
}

Save the above code as hello.c

To call a VE function in a dynamic library with VEO:

Create a process on a VE node by veo_proc_create(). Specify VE node number to create a VE process. A VEO process handle is returned.
Load a VE library and find an address of a function to call. veo_load_library() loads a VE shared library on the VE process.
Create a VEO context, a thread in a VE process specified by a VEO process handle to execute a VE function, by veo_context_open().
Create a VEO arguments object by veo_args_alloc() and set arguments. See the next chapter "Various Arguments for a VE function" in detail.
Call a VE function by veo_call_async_by_name() with the name of a function and a VEO arguments object. A request ID is returned.
Wait for the completion and get the return value by veo_call_wait_result().

Compile VH main program

Compile source code on VH side as shown below.

$ gcc -o hello hello.c -I/opt/nec/ve/veos/include -L/opt/nec/ve/veos/lib64 \

-Wl,-rpath=/opt/nec/ve/veos/lib64 -lveo

The headers for VEO are installed in /opt/nec/ve/veos/include; libveo, the shared library of VEO, is in /opt/nec/ve/veos/lib64.

Run a program with VEO

Execute the compiled VEO program.

$ ./hello

Hello, world

VE code is executed on VE node 0, specified by veo_proc_create_static() or veo_proc_create().

Various Arguments for a VE function

You can pass one or more arguments to a function on VE. To specify arguments, VEO arguments object is used. A VEO argument object is created by veo_args_alloc(). When a VEO argument object is created, the VEO argument object is empty, without any arguments passed. Even if a VE function has no arguments, a VEO arguments object is still necessary.

VEO provides functions to set an argument in various types.

Basic types

To pass an integer value, the following functions are used.

int veo_args_set_i64(struct veo_args *args, int argnum, int64_t val);
int veo_args_set_u64(struct veo_args *args, int argnum, uint64_t val);
int veo_args_set_i32(struct veo_args *args, int argnum, int32_t val);
int veo_args_set_u32(struct veo_args *args, int argnum, uint32_t val);
int veo_args_set_i16(struct veo_args *args, int argnum, int16_t val);
int veo_args_set_u16(struct veo_args *args, int argnum, uint16_t val);
int veo_args_set_i8(struct veo_args *args, int argnum, int8_t val);
int veo_args_set_u8(struct veo_args *args, int argnum, uint8_t val);

You can pass also a floating point number argument.

int veo_args_set_double(struct veo_args *args, int argnum, double val);

int veo_args_set_float(struct veo_args *args, int argnum, float val);

For instance: suppose that proc is a VEO process handle and func(int, double) is defined in a VE library whose handle is handle.

struct veo_args *args = veo_args_alloc();
veo_args_set_i32(args, 0, 1);
veo_args_set_double(args, 1, 2.0);
uint64_t id = veo_call_async_by_name(ctx, handle, "func", args);

In this case, func(1, 2.0) is called on VE.

Stack arguments

Non basic typed arguments and arguments by reference are put on a stack. VEO supports an argument on a stack.

To set a stack argument to a VEO arguments object, call veo_args_set_stack().

int veo_args_set_stack(struct veo_args *args, int argnum, veo_args_intent inout,

int argnum, char *buff, size_t len);

The third argument specifies the argument is for input and/or output.

VEO_INTENT_IN: the argument is for input; data is copied into a VE stack on call.
VEO_INTENT_OUT: the argument is for output; a VE stack area is allocated without copy-in and data is copied out to VH memory on completion.
VEO_INTENT_INOUT: the argument is for both input and output; data is copied into and out from a VE stack area.

Attributes for VEO Context

You can create a VEO context which has a specified attribute.

Available attribute is 'stack size of VEO context' only.

For instance: suppose that proc is a VEO process handle.

#define STACK_SZ (256 * 1024 * 1024)
struct veo_thr_ctxt_attr *attr = veo_alloc_thr_ctxt_attr();
veo_set_thr_ctxt_stacksize(attr, STACK_SZ);
struct veo_thr_ctxt *ctx = veo_context_open_with_attr(proc, attr);

In this case, VEO context which has a 256MB stack is created.

Fortran Program

VE code

Code written by Fortran to run on VE is shown below.

SUBROUTINE SUB1(x, ret)
  implicit none
  INTEGER, INTENT(IN) :: x
  INTEGER, INTENT(OUT) :: ret
  ret = x + 1
END SUBROUTINE SUB1
INTEGER FUNCTION FUNC1(x, y)
  implicit none
  INTEGER, VALUE :: x, y
  FUNC1 = x + y
END FUNCTION FUNC1

Save the above code as libvefortran.f90.

Compile VE code

To build an executable with the functions statically linked, execute as follows:

$ /opt/nec/ve/bin/nfort -c -o libvefortran.o libvefortran.f90

$ /opt/nec/ve/bin/mk_veorun_static -o vefortran libvefortran.o

To build a shared library with the functions for dynamic loading, execute as follows:

$ /opt/nec/ve/bin/nfort -shared -fpic -o libvefortran.so libvefortran.f90

VH main program

Main routine on VH side to run VE program written by Fortran is shown here.

The example VH program to call a VE Fortran function in a statically linked executable:

#include <ve_offload.h>
#include <stdio.h>
int main()
{
  /* Load "vefortran" on VE node 0 */
  struct veo_proc_handle *proc = veo_proc_create_static(0, "./vefortran");
  uint64_t handle = 0UL;/* find a function in the executable */
  struct veo_thr_ctxt *ctx = veo_context_open(proc);
  struct veo_args *argp = veo_args_alloc();
  long x = 42;
  long y;
  veo_args_set_stack(argp, VEO_INTENT_IN, 0, (char *)&x, sizeof(x));
  veo_args_set_stack(argp, VEO_INTENT_OUT, 1, (char *)&y, sizeof(y));
  uint64_t id = veo_call_async_by_name(ctx, handle, "sub1_", argp);
  uint64_t retval;
  veo_call_wait_result(ctx, id, &retval);
  printf("SUB1 outputs %lu\n", y);
  veo_args_clear(argp);
  veo_args_set_i64(argp, 0, 1);
  veo_args_set_i64(argp, 1, 2);
  id = veo_call_async_by_name(ctx, handle, "func1_", argp);
  veo_call_wait_result(ctx, id, &retval);
  printf("FUNC1 return %lu\n", retval);
  veo_args_free(argp);
  veo_context_close(ctx);
  veo_proc_destroy(proc);
  return 0;
}

Save the above code as fortran.c.

If you want to pass arguments to VE Fortran function, please use veo_args_set_stack() to pass arguments as stack arguments. However if you want to pass arguments to arguments with VALUE attribute in Fortran function, please pass arguments by value in the same way as VE C function.

When you want to call VE Fortran function by veo_call_async_by_name() with the name of a Fortran function, please change the name of the Fortran function to lowercase, and add "_" at the end of the function name.

Taking libvefortran.f90 and fortran.c as an example, pass "sub1_" as a argument to veo_call_async_by_name() in fortran.c when calling the Fortran function named "SUB1" in libvefortran.f90.

The method of compiling and running VH main program are same as C program.

Compile VH main program

Compile source code on VH side as shown below. This is the same as the compilation method described above.

$ gcc -o fortran fortran.c -I/opt/nec/ve/veos/include -L/opt/nec/ve/veos/lib64 \

-Wl,-rpath=/opt/nec/ve/veos/lib64 -lveo

Run a program with VEO

Execute the compiled VEO program. This is also the same as the execution method described above.

$ ./fortran
SUB1 return 43
FUNC1 return 3

Parallelized Program using OpenMP

VE code using OpenMP

The following is an example of VE code using OpenMP written in C.

#include <stdio.h>
int omp_hello(void)
{
  int tid, nthreads = 0;
#pragma omp parallel private(nthreads, tid)
  {
    tid = omp_get_thread_num();
    printf("Hello, World! from thread = %d\n", tid);
    if (tid == 0)
    {
      nthreads = omp_get_num_threads();
      printf("Number of threads = %d\n", nthreads);
    }
  }  /* All threads join master thread and disband */
  fflush(stdout);
  return 0;
}

Save the above code in libomphello.c.

The following shows the example written in Fortran.

INTEGER FUNCTION OMP_HELLO()
  INTEGER :: TID = 0
  INTEGER :: NTHREADS = 0
!$OMP PARALLEL PRIVATE(TID, NTHREADS)
  TID = omp_get_thread_num()
  WRITE(*,*) "Hello, World! from thread = ", TID
  IF ( TID == 0 ) THEN
    NTHREADS = omp_get_num_threads()
    OMP_HELLO = NTHREADS
    WRITE(*,*) "Number of threads = ", NTHREADS
  END IF
!$OMP END PARALLEL
END FUNCTION OMP_HELLO

Save the above code in libompfortran.f90.

How to build VE code

To use OpenMP parallelization, specify -fopenmp at compilation and linking.

Following is an example of building VE code written in C.
To build a static-linked binary, execute as follows:

/opt/nec/ve/bin/ncc -c -o libomphello.o libomphello.c -fopenmp

/opt/nec/ve/bin/mk_veorun_static -o veorun_omphello libomphello.o -fopenmp

To build a shared library, execute as follows:

/opt/nec/ve/bin/ncc -fpic -shared -o libomphello.so libomphello.c -fopenmp

Following is an example of building VE code written in Fortran.
To build a static-linked binary, execute as follows:

/opt/nec/ve/bin/nfort -c -o libompfortran.o libompfortran.f90 -fopenmp

/opt/nec/ve/bin/mk_veorun_static -o veorun_ompfortran libompfortran.o -fopenmp

To build a shared library, execute as follows:

/opt/nec/ve/bin/nfort -fpic -shared -o libompfortran.so libompfortran.f90 -fopenmp

Examples of VH main code that calls VE code provided above are saved as omphello.c, omphello_static.c, ompfortran.c and ompfortran_static.c.

To compile or execute the program, Please refer to "Compile VH main program" subsection of "Hello World" section in Getting Started with VE.
To execute the program, please refer to "Run a program with VEO" subsection of "Hello World" sction in the Getting Started with VEO.

MPI program with VEO

To compile a MPI program witch calls VEO API using mpincc with '-lveo' option.
To execute a MPI program which calls VEO API using mpiexec or mpirun with '-veo' option.

Please refer to MPI document NEC MPI User's Guide for mpiexec, mpirun, mpincc and MPI API details.

In order to transfer data on VE memory using MPI API, it is necessary to allocate VE memory using veo_alloc_hmem().

Examples

The example MPI program which calls VEO API.

You can compile it with following commands.

$ source /opt/nec/ve/mpi/2.15.0/bin/necmpivars.sh
$ mpincc -vh mpi-veo.c -o mpi-veo -I/opt/nec/ve/veos/include -L/opt/nec/ve/veos/lib64 \
  -Wl,-rpath=/opt/nec/ve/veos/lib64 -lveo
$ mpincc -fpic -shared libmpive.c -lveohmem -o libmpive.so

You can run it with following commands.

$ source /opt/nec/ve/mpi/2.15.0/bin/necmpivars.sh

$ VE_OMP_NUM_THREADS=1 mpirun -veo -np 4 ./mpi-veo

FTRACE Performance Information File

To generate a ftrace.out, specify "-ftrace" option at compilation and linking VE code. A ftrace.out is generated on invocation of veo_proc_destroy() from VH main program.

Please refer to PROGINF/FTRACE User's Guide for more details of FTRACE.

Here is an example of building VE code wiritten in C using or not using OpenMP.

How to get ftrace.out with a different file name for each VE process

To generate ftrace.out with a different file name for each VE process, please install nec-veperf packages (later 2.3.0) and veoffload-aveo and veoffload-aveorun (2.13.0 or later).

AVEO (2.13.0 or later) sets the default performance information file name "ftrace.out" and the suffix for AVEO "veo" in the environment variable VE_FTRACE_OUT_NAME.

When the VE process linked with the ftrace library is executed using AVEO, performance information file is as follows.

without MPI
ftrace.out.veo.${VE_NODE_NUMBER}.$$
with MPI
ftrace.out.${MPIUNIVERSE}.${MPIRANK}.veo.${VE_NODE_NUMBER}.$$

${XXX} represents the value of the XXX environment variable. For example, ${MPIRANK} represents the value of MPIRANK environment variable. $$ represents VE process PID to which the ftrace library linked.

How to build VE code not using OpenMP

To build a static-linked binary without OpenMP for ftrace, execute as follows:

$ /opt/nec/ve/bin/ncc -ftrace -c -o libvehello.o libvehello.c

$ /opt/nec/ve/bin/mk_veorun_static -ftrace -o veorun_static libvehello.o

To build a shared library for ftrace, execute as follows:

$ /opt/nec/ve/bin/ncc -ftrace -shared -fpic -o libvehello.so libvehello.c -lveftrace_t

To build code written in Fortran, change the compiler to nfort.

How to build VE code using OpenMP

To build a static-linked binary with OpenMP for ftrace, execute as follows:

$ /opt/nec/ve/bin/ncc -ftrace -c -o libomphello.o libomphello.c -fopenmp

$ /opt/nec/ve/bin/mk_veorun_static -ftrace -o veorun_omphello libomphello.o -fopenmp

To build a shared library for ftrace, execute as follows:

$ /opt/nec/ve/bin/ncc -ftrace -shared -fpic -o libomphello.so libomphello.c -lveftrace_p -fopenmp

To build code written in Fortran, change the compiler to nfort.

Relinking veorun for Newer Compilers (To do when you updated the compiler)

veorun is the helper program required for the dynamic loading of shared libraries. In order to take advantage of new features or bug fixes provided by updated compilers for VE, you will need to relink veorun for those compilers.

To link veorun which can loads shared library using OpenMP written in Fortran, execute as follows.

$ touch dummy.c
$ /opt/nec/ve/bin/ncc -o dummy.o -c dummy.c
$ /opt/nec/ve/bin/mk_veorun_static veorun dummy.o -fopenmp

If you need to generate ftrace.out file, please add "-ftrace" option to mk_veorun_static.

To use the newly created veorun, set the environment variable VEORUN_BIN.

export VEORUN_BIN=$(pwd)/veorun

And execute a VEO program.

Debugging Log

Set the environment variable VEO_LOG_DEBUG to some value and execute a VEO program. The log is output as standard output.

export VEO_LOG_DEBUG=1

Traceback Information for VEO

You can use traceback information on VE program.

Compile VE Code with -traceback option

To build an executable with the functions statically linked, execute as follows:

$ /opt/nec/ve/bin/ncc -c -o libvehello.o libvehello.c -traceback=verbose

$ /opt/nec/ve/bin/mk_veorun_static -o vehello libvehello.o -traceback=verbose

To build a shared library with the functions for dynamic loading, execute as follows:

$ /opt/nec/ve/bin/ncc -fpic -shared -o libvehello.so libvehello.c -traceback=verbose

-traceback[=verbose] : Specifies to generate extra information in the object file and to link run-time library due to provide traceback information when a fatal error occurs and the environment variable VE_TRACEBACK is set at run-time. When verbose is specified, generates filename and line number information in addition to the above due to provide these information in traceback output.

Run a program with VEO with the environment variable

Execute the compiled VEO program. Set the environment variable VE_TRACEBACK=VERBOSE to output these information at run-time. In example, the exception of "Divide by zero" occurred in line 5 of libvehello.c.

'hello' that is executed with VE_TRACEBACK in example is VH main program to run 'vehello' or 'libvehello.so'. The example of VH main program are hello.c and hello_static.c. For more details of VH main program, please refer to "Hello World" in "Getting Started with VEO".

$ VE_TRACEBACK=VERBOSE VE_FPE_ENABLE=DIV ./hello
Hello, world
[VE] ERROR: sigactionHandler() Interrupt signal 8 received
Runtime Error: Divide by zero at 0x600008001160
[ 0] 0x600008001160 hello            libvehello.c:5
[ 1] 0x600fffdffff8 ?                ?:?
[ 2] 0x600000036718 ?                ?:?
[ 3] 0x60000002a710 ?                ?:?
[ 4] 0x600000029c48 ?                ?:?
[ 5] 0x600c046407a8 ?                ?:?
[ 6] 0x600000006000 ?                ?:?
[VH] [TID 3599] ERROR: unpack_call_result() VE exception 8
[VH] [TID 3599] ERROR: _progress_nolock() Internal error on executing a command(-4)

VE_TRACEBACK : This variable is used to control to output traceback information when a fatal error occurs at runtime. When running the program which is compiled and linked with -traceback=verbose and the value of this variable is "VERBOSE", filename and line number is output in traceback information.

Debugging VEO Programs

AVEO(2.8.2 or later) supports the environment variable to debug VE processes.

VH side debugging

When you want to debug a VH process, execute the following command.

gdb --args ./veo_program arg1 arg2

VE side debugging

When you want to debug a single VE process, set the environment variable as below.

export VEO_DEBUG=console

Then, execute your program. Whenever a VE process starts, a VE debugger gdb is shown in your console. Type "run" to continue, or set breakpoints, etc.

This works under the premise that you don't need to interact with the host side program (e.g. it requires no input).

If you spawn multiple VE processes with the environment variable VEO_DEBUG=console set, it leads to undefined behaivior.

When X Window System is available and you want to debug multiple VE processes, set the environment variable as below.

export VEO_DEBUG=xterm

Then, execute your program. Whenever multiple VE processes start, xterm windows will open up and show VE debugger prompts. Type "run" to continue in each of these windows.

This works under the premise that the environmet DISPLAY is set up properly.

Environment Variables for VEO

AVEO(2.7.5 or later) supports the environment variables to optimize the performance of data transfer.

Environment variables	Brief	Default value
VEO_SENDFRAG	The maximum size of fragments of data to write (send).	4MiB
VEO_SENDCUT	The threshold to cut data into fragments to write (send).	2MiB
VEO_RECVFRAG	The maximum size of fragments of data to read (receive).	4MiB
VEO_RECVCUT	The threshold to cut data into fragments to read (receive).	2MiB

AVEO cuts data into the fragments to write (send) as described in the below table:

Data size[MiB]	Fragment size[MiB]
>= VEO_SENDFRAG * 5	VEO_SENDFRAG
> VEO_SENDCUT	Data size / 4
> VEO_SENDCUT / 2	Data size / 3
> VEO_SENDCUT / 4	Data size / 2
> 0	Data size

AVEO cuts data into the fragments to read (receive) as described in the below table:

Data size[MiB]	Fragment size[MiB]
>= VEO_RECVFRAG * 5	VEO_RECVFRAG
> VEO_RECVCUT	Data size / 5
> VEO_RECVCUT / 2	Data size / 4
> VEO_RECVCUT / 4	Data size / 2
> 0	Data size