.. _Llvm-vec-label:
   
llvm-vec - NEC LLVM-IR Vectorizer
=================================

Synopsis
--------

:program:`llvm-vec` [*options*] *filename*

Example

.. code-block:: console

   $ /opt/nec/ve/llvm-vec/<version>/bin/llvm-vec -o a.s -x ir a.ll

Description
-----------

The :program:`llvm-vec` command compiles LLVM source inputs into Vector Engine
(VE) assembly language. The assembly language output can then be passed through
a VE assembler (:program:`nas`) and linker (:program:`nld`) to generate a VE
native executable.

The *filename* can specify one LLVM source.

Options
-------

 Many options have names beginning with -f, -m or -W. Most of them have positive and negative forms. The negative forms are beginning with -fno-, -mno- or -Wno-.

.. option:: --help

   Displays usage of the compiler.

.. option:: --version

   Displays the version number and copyrights of NEC LLVM-IR Vectorizer.

.. option:: -O<n>

   Specifies optimization level by n. (default: -O2)

   **-O4**
     Enables aggressive optimization which violates language standard.
   **-O3**
     Enables optimization which causes side-effects and nested loop optimization.
   **-O2**
     Enables optimization which causes side-effects.
   **-O1**
     Enables optimization which does not cause any side-effects.
   **-O0**
     Disables any optimizations, automatic vectorization, parallelization, and inlining.

.. option:: -faggressive-associative-math

   | Allows re-association of operands aggresively in series during optimization. This optimization causes side-effect.
   | (default: -fno-aggressive-associative-math)

.. option:: -fargument-alias

   | Disallows the compiler to assume that arguments are not aliasing each other and non-local-objects in all optimization.
   | (default)

.. option:: -fargument-noalias

   | Allows the compiler to assume that arguments are not aliasing each other and non-local-objects in all optimization.
   | (default: -fargument-alias)

.. option:: -fassociative-math

   | Disallows re-association of operands in series during optimization and loop transformation.
   | (default)

.. option:: -fcse-after-vectorization

   | Re-apply common subexpression elimination after vectorization.
   | (default: -fno-cse-after-vectorization)

.. option:: -fdiag-vector=<n>

   | Specifies vector diagnostics level by n. (0: No output, 1:Information, 2:Detail) The vector diagnostic is output to the standard error output.
   | (default: -fdiag-vector=1)

.. option:: -ffast-math

   | Does not uses fast scalar version math functions outside of vectorized loops.
   | (default)

.. option:: -finstrument-functions

   Inserts function calls for the instrumentation to entry and exit of functions. The instrumented functions are;

   .. code-block:: c

      void __cyg_profile_func_enter(void *this_fn, void *call_site);
      void __cyg_profile_func_exit(void *this_fn,void *call_site);

.. option:: -fivdep

   | Inserts ivdep directive before all loops.
   | (default: -fno-ivdep)

.. option:: -floop-collapse

   | Allows loop collapsing. -On (n=2,3,4) must be effective.
   | (default: -fno-loop-collapse)

.. option:: -floop-count=n

   | Specifies n which is taken to assume the iteration count of the loop whose iteration count cannot be decided at compilation.
   | (default: -floop-count=5000)

.. option:: -fmove-loop-invariants

   | Enables the loop invariant motion under if-condition.
   | (default)

.. option:: -fmove-loop-invariants-unsafe

   | The unsafe codes which may cause any side effects are moved.
   | (default: -fno-move-loop-invariants-unsafe)
  
   The example of unsafe codes are:

   - divide
   - memory reference to 1 byte or 2byte area

   -fmove-loop-invariants must be effective when you specify this option.

.. option:: -fmove-nested-loop-invariants-outer

   | Disallows the compiler to move the loop invariants expression to outer loop. When this option is specified they are moved before the current loop.
   | (default)

.. option:: -fnamed-alias

   | Disallows the compiler to assume that the object pointed-to-by a named pointer are no aliasing in vectorization.
   | (default)

.. option:: -fnamed-noalias

   | Allows the compiler to assume that the object pointed-to-by a named pointer are no aliasing in vectorization.
   | (default: -fnamed-alias)

.. option:: -fpic , -fPIC

   Generates position-independent code.

.. option:: -fprecise-math

   | Apply high resolution algorithm in the vector version of power operation when the exponent is an integer value. The result becomes more exact but the calculation speed becomes slower.
   | (default: -fno-precise-math)

.. option:: -freplace-loop-equation

   | Replaces "!=" and "==" operator with "<=" or ">=" at the loop backedge.
   | (default: -fno-replace-loop-equation)

.. option:: -fstrict-aliasing

   Disallows the compiler to assume the ANSI aliasing rules in all optimization. The compiler assumes the stored value is accessed only by one of the following types.

   - A type compatible with the effective type of the object
   - A qualified version of a type compatible with the effective type of the object
   - A type that is the signed or unsigned type corresponding to the effective type of the object
   - A type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object
   - An aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union)
   - A character type

   (default: -fno-strict-aliasing)

.. option:: -fthis-pointer-alias

   | Disallows the compiler to assume that this pointer does not have any alias in all optimization.
   | (default: -fthis-pointer-noalias)

.. option:: -fthis-pointer-noalias

   Allows the compiler to assume that this pointer does not have any aliases in all optimization. (default)

.. option:: -ftrace

   Creates an object file and the executable file for ftrace function.

.. option:: -minit-stack=<value>

   Initializes the stack area with the specified value at the run-time. The following are available as value:

   **zero**
     Initializes with zeroes.
   **nan**
     Initializes with quiet NaN in double type (0x7fffffff7fffffff).
   **nanf**
     Initializes with quiet NaN in float type (0x7fffffff).
   **snan**
     Initializes with signaling NaN in double type (0x7ff4000000000000).
   **snanf**
     Initializes with signaling NaN in float type (0x7fa00000).
   **0xXXXX**
     Initializes with the value specified in a hexadecimal format up to 16 digits. When the specified value has more than 8 hexadecimal digits, the initialization is done on an 8-byte cycle. Otherwise it is done on a 4-byte cycle.

.. option:: -mlist-vector

   | Allows the vectorization of the statement in a loop when an array element with a vector subscript expression appears on both the left and right sides of an assignment operator. 
   | (default: -mno-list-vector)

.. option:: -mretain-<keyword>

   | Set priority of retaining LLC to kind. The following kind can be specified.
   | (default: -mretain-all)

   **all**
     Set higher priority to vector load/store/gather/scatter results to retain on LLC. (default)
   **list-vector**
     Set higher priority to vector gather/scatter results to retain on LLC.
   **none**
     Do not set higher priority to vector load/store/gather/scatter results to retain on LLC.

.. option:: -msched-<keyword>

   | Specifies the instruction scheduling kind by kind.
   | (default: -msched-insns)

   **none**
     Dose not perform instruction scheduling.
   **insns**
     Performs the instruction scheduling in a basic block. (default)
   **block**
     Performs the instruction scheduling in basic block, but to a wider range than -msched-insns does, in order to schedule instructions aggressively.
   **interblock**
     Performs the instruction scheduling in two or more basic blocks, in order to schedule instructions aggressively. The compiler may require more time and memory at compilation.

.. option:: -mvector

   | Enables automatic vectorization.
   | (default)

.. option:: -mvector-advance-gather

   | Move vector gather operations so that they can be started as advance as possible.
   | (default)

.. option:: -mvector-advance-gather-limit=<n>

   | The number of vector gather operations which is moved by -mvector-advance-gataher is up to n.
   | (default: -mvector-advance-gather-limit=56)

.. option:: -mvector-floating-divide-instruction

   | Uses vector floating divide instruction for division. In default, apporximate instruction sequence is used.
   | (default: -mno-vector-floating-divide-instruction)

.. option:: -mvector-fma

   | Allows to use vector fused-multiply-add instruction.
   | (default)

.. option:: -mvector-intrinsic-check

   | Checks the value ranges of arguments in the mathematical functions the vectorized version. 
   | (default: -mno-vector-intrinsic-check)

   The target mathematical functions of this option are as follows.
     acos, acosh, asin, atan, atan2, atanh, cos, cosh, cotan, exp, exp10, exp2, expm1, log10, log2, log, pow, sin, sinh, sqrt, tan, tanh

.. option:: -mvector-iteration

   | Allows to use vector iteration instruction in the vectorization.
   | (default)

.. option:: -mvector-iteration-unsafe

   | Allows to use vector iteration instruction in the vectorization even when it may give incorrect result.
   | (default: -mno-vector-iteration-unsafe)

.. option:: -mvector-low-precise-divide-function

   | Takes low-precise divide function for vector floating division.
   | (default: -mno-vector-low-precise-divide-function)

.. option:: -mvector-merge-conditional

   | Allows to merge vector load and store in THEN block, ELSE IF block, and ELSE block.
   | (default: -mno-vector-merge-conditional)

.. option:: -mvector-packed

   | Allows to use packed vector instruction in the vectorization.
   | (default: -mno-vector-packed)

.. option:: -mvector-power-to-explog

   | Allows to replace pow(R1,R2) and/or ** operator in a vectorized loop with exp(R2,log(R1)). By the replacement, the execution time would be shortened, but numerical error occurs rarely in the calculation.
   | (default:-mno-vector-power-to-explog)

.. option:: -mvector-power-to-sqrt

   | Allows to replace pow(R1,R2) in a vectorized loop with the expression including sqrt(3C) or cbrt(3C) when R2 is a special value such as 0.5, 1.0/3.0 etc. When it is replaced, the execution time would become faster, but numerical error occurs rarely in the calculation.
   | (default)

.. option:: -mvector-reduction

   | Disallows to use vector reduction instruction in the vectorization.
   | (default)

.. option:: -mvector-sqrt-instruction

   | Uses vector sqrt instruction for SQRT. In default, apporximate instruction sequence is used.
   | (default: -mno-vector-sqrt-instruction)

.. option:: -mvector-threshold=<n>

   | Specifies the minimum iteration count n of a loop for vectorization.
   | (default: -mvecter-threshold=5)

.. option:: -o <filename>

   Specifies a file name filename to which output is written, where the output is assembler source file.

.. option:: -p , -pg

   Creates an executable file for output profiler information (ngprof).

.. option:: -proginf

   | Creates an executable file for PROGINF function.
   | (default: -proginf)

.. option:: -traceback[=verbose]

   | Specifies to generate extra information in the object file and to link run-time library due to provide traceback information when a fatal error occurs and the environment variable VE_TRACEBACK is set at run-time.
   | When verbose is specified, generates filename and line number information in addition to the above due to provide these information in traceback output. Set the environment variable VE_TRACEBACK=VERBOSE to output these information at run-time.

.. option:: -v

   Displays the invoked commands at each stage of compilation.

.. _Metadata-label:

Metadata
--------

The following Metadata are used to control NEC LLVM-IR Vectorizer optimization and correspaonds to :ref:`Compiler-directives`.
There are two kinds of Metadata. One has two operands and the other has one operand.

The former second operand is a bit.
`'llvm.loop.nec.vector.enable’` Metadata is representative of the former.
It corresponds to `'vector'` and `'novector'` :ref:`Compiler-directives`. Its notation is as follows.

.. code-block:: console

 !{!"llvm.loop.nec.vector.enable", i1 true}
 !{!"llvm.loop.nec.vector.enable", i1 false}

The latter has no second operand.
`'llvm.loop.nec.gather_reorder’` Metadata is representative of the latter.
It corresponds to `'gather_reorder'` :ref:`Compiler-directives`. Its notation is as follows.

.. code-block:: console

 !{!"llvm.loop.nec.gather_reorder"}

Although there is a difference in the presence or absence of a second operand, both follow the rules of 'llvm.loop'.
For more information on 'llvm.loop', please see https://llvm.org/docs/LangRef.html#llvm-loop.

New Metadata list
^^^^^^^^^^^^^^^^^^^

+--------------------------------------+---------------------+----------------+
| Metadata                             | Compiler directives | Second Operand |
+======================================+=====================+================+
| llvm.loop.nec.advance_gather.enable  | advance_gather      | true           |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.advance_gather.enable  | noadvance_gather    | false          |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.assume.enable          | assume              | true           | 
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.assume.enable          | noassume            | false          |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.gather_reorder         | gather_reorder      |                |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.ivdep                  | ivdep               |                | 
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.list_vector.enable     | list_vector         | true           |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.list_vector.enable     | nolist_vector       | false          |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.lstval.enable          | lstval              | true           |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.lstval.enable          | nolstval            | false          |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.move.enable            | move                | true           |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.move.enable            | nomove              | false          |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.move_unsafe            | move_unsafe         |                |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.nofma                  | nofma               |                |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.packed_vector.enable   | packed_vector       | true           |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.packed_vector.enable   | nopacked_vector     | false          |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.shortloop              | shortloop           |                |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.sparse.enable          | sparse              | true           |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.sparse.enable          | nosparse            | false          |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.vector.enable          | vector              | true           |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.vector.enable          | novector            | false          |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.verror_check.enable    | verror_check        | true           |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.verror_check.enable    | noverror_check      | false          |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.vob.enable             | vob                 | true           |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.vob.enable             | novob               | false          |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.vovertake.enable       | vovertake           | true           |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.vovertake.enable       | novovertake         | false          |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.vwork.enable           | vwork               | true           |
+--------------------------------------+---------------------+----------------+
| llvm.loop.nec.vwork.enable           | novwork             | false          |
+--------------------------------------+---------------------+----------------+


Remarks
-------

LLVM source inputs must be legal which can be compiled by LLVM :program:`llc`
command.

The :program:`llvm-vec` generates assembly language outputs which are not
compatible with those generated by **NEC C/C++ Compiler** (:program:`ncc/nc++`)
and mixture among executable from assembly language output generated by them is
not supported.

:program:`llvm-vec` uses 'SjLj' exception handling mechanism.