A part of program can be offloaded from VH into VE with VE Offloading (VEO). This is the offloading of "opposite" direction of VH Call. This is useful when whole program is difficult to be ported into VE. i.e. NLCPy (https://pypi.org/project/nlcpy/), Frovedis (https://github.com/frovedis/frovedis) and etc. are implemented with VE Offload functions.
The attached File:veo sample.tgz is a sample program of primitive implementation of DGEMM with VEO. Here, 2,048x2,048 DGEMM is performed. Array size can be specified on line 12 in "main.c".
About the detail of VE Offloading, see the following.
https://sxauroratsubasa.sakura.ne.jp/documents/veos/en/aveo/index.html
"veo_sample.tgz" includes the followings.
./veo_sample/
|-- Makefile: Makefile of the main program and a shared library "libve.so" runs on VE.
| Main program is compiled with gcc and "libve.so" is compiled with ncc.
|-- main.c: Source program of the main program.
`-- libve.c: Souce program of "libve.so". Note that this is normal function which includes no special operations for VEO.
You can make/run it as follows.
$ make
$ ./a.out
on VH: 32.00 [s]
VH->VE copy: 0.01 [s]
on VE: 0.57 [s]
VE->VH copy: 0.01 [s]
Max. relative error: 4.356579e-15
Here, calculation on VE spends only 0.57 seconds. Note that 0.01 seconds of overhead exist for VE-VH data copy.
In "main.c",
- A process runs on VE is created with "veo_proc_create" (line 38).
- VEO library, "libve.so" is loaded with "veo_load_library" (line 41).
- Open a "context" of the process run on VE with "veo_context_open" (line 44).
- Allocate memory on VE for the arrays "a", "b" and "c" with "veo_alloc_mem" (line 53, 56 and 59). Note that the size of arguments passed to functions runs on VE is limited. So, large data array should be shared with this manner. About the restrictions, see https://sxauroratsubasa.sakura.ne.jp/documents/veos/en/aveo/md_Restriction.html.
- Copy the data of arrays "a", "b" and "c" from VH to VE with "veo_write_mem" (line 63, 66 and 69).
- Allocate the structure for the arguments of VE Offload function with "veo_args_alloc" (line 74).
- Specify arguments with "veo_args_set_i32", "veo_args_set_u64" and etc. (line 75-78) Note that the pointers point to the memory area of array "a", "b" and "c" are passed on arguments.
- Get a "symbol" of the function "dgemm_ve" runs on VE with "veo_get_sym" (line 79).
- After that, execute offloaded function with "veo_call_async" (line 82) and wait it's finished with "veo_call_wait_result" (line 83).
- Free the structure of arguments with "veo_args_free" (line 86).
- Copy the result data of array "c" from VE to VH with "veo_read_mem" (line 89).
Posted by Tkato on 27 September 2022 at 07:53. Edited by Tkato on 27 September 2022 at 07:54. |
|