Forum - Code sample of VH Call (vhcall)

Jump to navigation Jump to search
Overview > Topics > Tuning > Code sample of VH Call (vhcall)
[#87]

A part of program can be offloaded from VE into VH with VH CAll. This is effective on the operations which are never vectorized and not good for VE. A typical example is formatted file I/O.

The attached File:FMTIOwithVHCALL.tgz is a sample program of formatted file output with vhcall.

About the detail of VH Call, a function of "libsyve", see the following.

  https://sxauroratsubasa.sakura.ne.jp/documents/veos/en/libsysve/md_doc_VHCall.html

"FMTIOwithVHCALL.tgz" includes the followings.

  ./FMTIOwithVHCALL/
  |-- Makefile: Makefile of the main program.
  |             vhcall is utilized if "CPPFLAGS=-DUSEVHCALL" is specified.
  |-- ./lib/: Make environment of a shared library "libvhcall.so" runs on VH.
  |   |-- Makefile: Makefile of "libvhcall.so". gcc is utilized here.
  |   `-- libvhcall.c: Source program of the function "vhfprintf" runs on VH.
  |-- run.sh: Execution script.
  |-- vhcalltest.c: Source program of the main program.
  |-- vhcalltest_before.c: Souce program of the original main program without vhcall.
  `-- vhcalltest_simple.c: Souce program of simplified main program. Return values are not checks and easy to see.

You can make/run it as follows.

  $ tar -zxvf FMTIOwithVHCALL.tgz
  $ cd ./FMTIOwithVHCALL/lib/
  $ make # ".so" file is made with gcc.
  $ cd ../
  $ make # Main program is made with ncc.
  $ ./run.sh

This program performs formatted output 20,971,152 elements of double precision values into "out" file. The followings are "proginf" output without/with vhcall.

                                         Without       With
                                         vhcall        vhcall
  Real Time (sec)                      :    14.606438      0.970042
  User Time (sec)                      :    14.561287      0.000285
  Vector Time (sec)                    :     0.000053      0.000053
  Inst. Count                          :  13769140634        127989
  V. Inst. Count                       :        16384         16384
  V. Element Count                     :      4194304       4194304
  V. Load Element Count                :            0             0
  FLOP Count                           :            0             0
  MOPS                                 :   962.268177  25886.079776
  MOPS (Real)                          :   942.962328      4.438896
  MFLOPS                               :     0.000000      0.000000
  MFLOPS (Real)                        :     0.000000      0.000000
  A. V. Length                         :   256.000000    256.000000
  V. Op. Ratio (%)                     :     0.030452     97.408097
  L1 Cache Miss (sec)                  :     2.279239      0.000069
  CPU Port Conf. (sec)                 :     0.000000      0.000000
  V. Arith. Exec. (sec)                :     0.000052      0.000052
  V. Load Exec. (sec)                  :     0.000000      0.000000
  VLD LLC Hit Element Ratio (%)        :     0.000000      0.000000
  FMA Element Count                    :            0             0
  Power Throttling (sec)               :     0.000000      0.000000
  Thermal Throttling (sec)             :     0.000000      0.000000
  Memory Size Used (MB)                :   298.000000    298.000000
  Non Swappable Memory Size Used (MB)  :    86.000000     86.000000

It becomes much faster form 14.6 seconds to 0.96 seconds.

Original program (without vhcall) is as follows (source code "vhcalltest_before.c").

 1 #include <stdio.h>
 2 #include <stdlib.h>
 3
 4 #define ARRAYSIZE (524288*4)
 5
 6 int main(void){
 7    double a[ARRAYSIZE];
 8    int i;
 9    FILE *fp;
10
11    for(i=0; i<ARRAYSIZE; i++) a[i] = 999.0;
12
13    fp = fopen("out", "w");
14    for(i=0; i<ARRAYSIZE; i++) fprintf(fp, "%d %le\n", i, a[i]);
15    fclose(fp);
16
17    return EXIT_SUCCESS;
18 }
19

Operations on line 13~15 is offloaded.

Soruce codes added to utilize vhcall is the part on "vhcalltest.c" which becomes enabled when "USEVHCALL" is defined.

- VH Call library, "libvhacall.so" is enabled with "vhcall_install".

- Get a "handler" to the function runs on VH with "vhcall_find".

- Allocate the structure for the arguments of VH Call function with "vhcall_args_alloc".

- Specify arguments with "vhcall_args_set_pointer", "vhcall_args_set_i32" and etc. Here, the first (0-th) argument is the filename, the second (1-st) is number of elements and the third (2-nd) is the target array "a".

- After that, execute offloaded function with "vhcall_invoke_with_args".

- Cleanup/finalize with "vhcall_args_free" and "vhcall_uninstall".


"./lib/" is the make environment of shared library "vhcalltest.c". To utilize VH Call, enable "-DUSEVHCALL" on Makefile.

Posted by Tkato on 26 September 2022 at 09:08.
Edited by Tkato on 26 September 2022 at 09:20.