Advanced Topics
This section describes a advanced usage of the JIT compilation functionality.
Compiling from C++ Source
Definition of the source
The entry routine should be defined with
extern "C".>>> cpp_src=r''' ... extern "C" { ... int ve_add(double *px, double *py, double *pz, int n) { ... #pragma omp parallel for ... for (int i = 0; i < n; i++) pz[i] = px[i] + py[i]; ... return 0; ... } ... } ... '''
Compilation
Pass
'nc++'or'/opt/nec/ve/bin/nc++'into thecompilerargument.>>> ve_lib = nlcpy.jit.CustomVELibrary(code=cpp_src, compiler='nc++')
Getting the function symbol
This is the same procedure as the case that compile from C source.
>>> ve_add = ve_lib.get_function( ... 've_add', ... args_type=(ve_types.uint64, ve_types.uint64, ve_types.uint64, ve_types.int32), ... ret_type=ve_types.int32 ... )
Compiling from Fortran Source
Definition of the source
>>> f_src = r""" ... integer(kind=4) function ve_add(px, py, pz, n) ... integer(kind=4), value :: n ... double precision :: px(n), py(n), pz(n) ... !$omp parallel do ... do i=1, n ... pz(i) = px(i) + py(i) ... end do ... ve_add = 0 ... end ... """
Compilation
Pass
'nfort'or'/opt/nec/ve/bin/nfort'into thecompilerargument.>>> ve_lib = nlcpy.jit.CustomVELibrary(code=f_src, compiler='nfort')
Getting the subroutine or function symbol
You should add
_to the end of the function name.>>> ve_add = ve_lib.get_function( ... 've_add_', ... args_type=(ve_types.uint64, ve_types.uint64, ve_types.uint64, ve_types.int32), ... ret_type=ve_types.int32 ... )
注釈
If you want to pass a scalar value as a parameter of VE Fortran function/subroutine,
please define the Fortran parameter with VALUE attribute.
Alternatively, you can use nlcpy.veo.OnStack to pass a parameter as a stack.
In this case, you should define the Fortran parameter without VALUE attribute.
注釈
The Fortran source code is generated internally with the suffix .f03
To preprocess before the compilation, please specify
'-fpp'option into thecflags.To enable source code to be described in fixed form, please specify
'-ffixed-form'option into thecflags.
For details, please refer to the Fortran Compiler User's Guide from here.
Transferring Ndarray Attributes to VE
NLCPy provides a easy way to access nlcpy.ndarray attributes from VE side.
制限事項
Accessing nlcpy.ndarray attributes from Fortran has not supported yet.
Definition of the source
You should include
nlcpy.hin C/C++ source and useve_arraystructure.Here is a example of 2-D element-wise addition.
>>> c_src=r''' ... #include <nlcpy.h> ... ... void ve_add(ve_array *x, ve_array *y, ve_array *z) { ... /* get a pointer */ ... double *px = (double *)x->ve_adr; ... double *py = (double *)y->ve_adr; ... double *pz = (double *)z->ve_adr; ... /* get an each stride of an array index */ ... uint64_t ix0 = x->strides[x->ndim-1] / x->itemsize; ... uint64_t ix1 = x->strides[x->ndim-2] / x->itemsize; ... uint64_t iy0 = y->strides[y->ndim-1] / y->itemsize; ... uint64_t iy1 = y->strides[y->ndim-2] / y->itemsize; ... uint64_t iz0 = z->strides[z->ndim-1] / z->itemsize; ... uint64_t iz1 = z->strides[z->ndim-2] / z->itemsize; ... /* execute element-wise addition */ ... #pragma omp parallel for ... for (int i = 0; i < z->shape[z->ndim-2]; i++) { ... for (int j = 0; j < z->shape[z->ndim-1]; j++) { ... pz[i*iz1 + j*iz0] = px[i*ix1 + j*ix0] + py[i*iy1 + j*iy0]; ... } ... } ... } ... '''
For details of the C-structure, please refer to the C Interfaces.
Compilation
By default, only pass source code into the
codeargument.>>> ve_lib = nlcpy.jit.CustomVELibrary(code=c_src)
If you specify
cflagsargument, it is necessary to include the header file path that can be retrieved fromnlcpy.get_include().Getting the function symbol
Pass
'void *'orve_types.void_pinto theargs_typeelements that corresponding to theve_arraystructure in the VE side argument.>>> ve_add = ve_lib.get_function( ... 've_add', ... args_type=(ve_types.void_p, ve_types.void_p, ve_types.void_p), ... ret_type=ve_types.void ... )
Execution
Pass a
nlcpy.ndarrayobject into the argument of theCustomVEKernel.__call__().>>> x = nlcpy.arange(20, dtype='f8').reshape((4, 5)) >>> y = nlcpy.arange(20, dtype='f8').reshape((4, 5)) >>> z = nlcpy.empty((4, 5), dtype='f8') >>> ve_add(x, y, z) >>> print(z) [[ 0. 2. 4. 6. 8.] [10. 12. 14. 16. 18.] [20. 22. 24. 26. 28.] [30. 32. 34. 36. 38.]]
C Interfaces
-
struct ve_array
The
ve_arrayC-structure contains the required information for anlcpy.ndarray. All instances of anlcpy.ndarraywill have this structure.The members of the
ve_arrayare as follows:-
uint64_t ve_adr
The address point to the first element of the array.
-
uint64_t ndim
The number of dimensions in the array.
-
uint64_t size
The total size of the array.
-
uint64_t shape[NLCPY_MAXNDIM]
The shapes of the array. An array of integers providing the shape in each dimension.
Given a
nlcpy.ndarrayfromnlcpy.empty((3, 4, 5), dtype='f8'), the shape of C-structer is:ve_array.shape[0] : 3 ve_array.shape[1] : 4 ve_array.shape[2] : 5 ve_array.shape[3] : undifiend ... ve_array.shape[NLCPY_MAXNDIM-1] : undifiend
-
uint64_t strides[NLCPY_MAXNDIM]
The strides of the array. An array of integers providing the number of bytes that must be skipped to get to the next element in that dimension.
Given a
nlcpy.ndarrayfromnlcpy.empty((3, 4, 5), dtype='f8'), the strides of C-structer is:ve_array.strides[0] : 160 ve_array.strides[1] : 40 ve_array.strides[2] : 8 ve_array.strides[3] : undifiend ... ve_array.strides[NLCPY_MAXNDIM-1] : undifiend
-
uint64_t dtype
The data type of the array. The correspondence values is below:
enum ve_dtype { ve_bool = 0, ve_i8 = 1, ve_u8 = 2, ve_i16 = 3, ve_u16 = 4, ve_i32 = 5, ve_u32 = 6, ve_i64 = 7, ve_u64 = 8, ve_f16 = 23, ve_f32 = 11, ve_f64 = 12, ve_c64 = 14, ve_c128 = 15, };
This enum data can be defined by
nlcpy.h.
-
uint64_t itemsize
The number of bytes for one element of the array.
-
uint64_t is_c_contiguous
Whether the array is C-style contiguous order or not.
1means yes,0means no.
-
uint64_t is_f_contiguous
Whether the array is Fortran-style contiguous order or not.
1means yes,0means no.
-
uint64_t ve_adr
Transferring Buffer Data to VE
Python objects that support the buffer interface can be transferred to the
VE arguments by using nlcpy.veo.OnStack.
>>> from nlcpy import veo
>>> import numpy
>>>
>>> src = r'''
... #include <stdint.h>
... void onstack_test(int32_t *a, float *b) {
... b[0] = (float)(a[0] + a[1]);
... }
... '''
>>> ve_lib = nlcpy.jit.CustomVELibrary(code=src)
>>> test = ve_lib.get_function(
... 'onstack_test',
... args_type=(ve_types.void_p, ve_types.void_p),
... ret_type=ve_types.void
... )
>>>
>>> a = numpy.array([1, 2], dtype='i4')
>>> b = numpy.empty(1, dtype='f4')
>>> test(
... veo.OnStack(a, inout=veo.INTENT_IN),
... veo.OnStack(b, inout=veo.INTENT_OUT),
... sync=True
... )
>>> b
array([3.], dtype=float32)
参考
For details of OnStack, please refer to the
py-veo project.
Customizing Compiler Options
Cflags and ldflags can be customized from a tuple of string elements.
>>> ve_lib = nlcpy.jit.CustomVELibrary(
... code=something,
... cflags=nlcpy.jit.get_default_cflags(openmp=False, opt_level=3) + ('-mvector-packed', '-ffast-math'),
... ldflags=nlcpy.jit.get_default_ldflags(openmp=False) + ('-L', 'your/library/path', '-lsomething'),
... )
FTRACE can be enabled from the ftrace argument.
>>> ve_lib = nlcpy.jit.CustomVELibrary(
... code=something,
... ftrace=True,
... )
You can also use NLC routines just by enabling the use_nlc argument.
>>> ve_lib = nlcpy.jit.CustomVELibrary(
... code=r'''
... #include <asl.h>
... asl_int_t call_dbgmsm(double *ab, asl_int_t *ipvt, asl_int_t lna,
... asl_int_t n, int64_t m) {
... return ASL_dbgmsm(ab, lna, n, m, ipvt);
... }
... ''',
... use_nlc=True,
... )
注釈
When enabling the use_nlc flag, the following libraries will be linked internally:
libasl_openmp_i64
libaslfftw3_i64
liblapack_i64
libblas_openmp_i64
libsca_openmp_i64
libheterosolver_openmp_i64
libsblas_openmp_i64
libcblas_i64
注釈
Only the OpenMP & 64bit integer version of the NLC can be used.
注釈
If you enable the use_nlc flag with Fortran source, you should add the
option '-fdefault-integer=8' to the cflags.
注釈
If you use ASL Unified Interface, you should not call following functions because there will be internally called at the beginning/end of the NLCPy process.
asl_library_initialize()asl_library_finalize()
参考
For the notices of compiler options, please refer to the aveo documentation.
Callback Setting
The Python function set into the callback argument
will be executed when the result of the VE function will be retrieved.
The callback function should take a one argument that is corresponding to the return value of the
VE function.
>>> def callback(err):
... # do something
... return
Here, we show a simple example that uses a callback function.
The following code will be used for the example:
>>> import string
>>> err = {
... 'ERR_OK': 0,
... 'ERR_MEMORY': 1,
... 'ERR_NDIM': 2,
... 'ERR_DTYPE': 3,
... 'ERR_CONTIGUOUS': 4,
... }
>>>
>>> temp = string.Template(r'''
... #include <nlcpy.h>
... #include <stdlib.h>
...
... uint64_t callback_test(ve_array *x) {
... double *px = (double *)x->ve_adr;
... if (px == NULL) return ${ERR_MEMORY};
... if (x->ndim != 1) return ${ERR_NDIM};
... if (x->dtype != ve_f64) return ${ERR_DTYPE};
... if (! (x->is_c_contiguous & x->is_f_contiguous)) return ${ERR_CONTIGUOUS};
... /* do something here */
... return ${ERR_OK};
... }
... ''')
>>> src = temp.substitute(err)
>>> print(src)
#include <nlcpy.h>
#include <stdlib.h>
uint64_t callback_test(ve_array *x) {
double *px = (double *)x->ve_adr;
if (px == NULL) return 1;
if (x->ndim != 1) return 2;
if (x->dtype != ve_f64) return 3;
if (x->is_c_contiguous & x->is_f_contiguous) return 4;
/* do something here */
return 0;
}
Prepare the executable object:
>>> ve_lib = nlcpy.jit.CustomVELibrary(code=src)
>>> callback_test = ve_lib.get_function(
... 'callback_test',
... args_type=(ve_types.void_p,),
... ret_type=ve_types.uint64
... )
Define the callback function:
>>> def err_print(retval):
... # reverse lookup
... for k, v in err.items():
... if retval == v:
... print(k)
... return
... raise Exception
Execute some patterns with the callback function:
>>> x = nlcpy.arange(9, dtype='f8')
>>> callback_test(x, callback=err_print)
>>> nlcpy.request.flush()
ERR_OK
>>>
>>> x = nlcpy.arange(9, dtype='f8').reshape(3,3)
>>> callback_test(x, callback=err_print)
>>> nlcpy.request.flush()
ERR_NDIM
>>>
>>> x = nlcpy.arange(9, dtype='f4')
>>> callback_test(x, callback=err_print)
>>> nlcpy.request.flush()
ERR_DTYPE
>>>
>>> x = nlcpy.arange(9, dtype='f8')[::2]
>>> callback_test(x, callback=err_print)
>>> nlcpy.request.flush()
ERR_CONTIGUOUS
注釈
When you enable sync flag, the return value of the VE function can be
retrieved from the return value of CustomVEKernel.__call__().
>>> x = nlcpy.arange(9, dtype='f8')
>>> callback_test(x, sync=True)
0
Logging Compiler Output
Logging to standard output
>>> import sys
>>> ve_lib = nlcpy.jit.CustomVELibrary(code=src, log_stream=sys.stdout)
Logging to file stream
>>> with open('./compiler.log', 'w') as fs:
... ve_lib = nlcpy.jit.CustomVELibrary(code=src, log_stream=fs)