This text file describes the advanced features of SLURM for VE. It curr-
ently supports the following advanced features:
 - VEs Health Check
 - VE Accounting

(1) VEs Health Check

  (1-1) Overview

    You can perform a health check to see if the VE node has a failure 
    on the compute node that installs the VEs. If a failure is detected,
    the compute node and the job can be treated according to the sett-
    ings. It supports the following two types of functions:
     1) Health Check at Job Start and End
     2) Periodic Health Checks

  (1-2) Health Check at Job Start and End

    If VE is requested as gres at job start (job allocation) and end,
    Prolog and Epilog are used to check for failures in the allocated VEs.
    If a failure is detected on one or more VEs, the following two action
    modes can be selected. Even if a failure is detected on one VE, the
    health check is not interrupted and all allocated VEs will be checked.
    Jobs that do not specify VEs do not perform VE health checks.

    Action Mode 1 can be used for operations where you want to continue
    using the healthy VEs and CPU cores of the compute nodes even if some
    VEs fail. Action Mode 2 can be used in operations where the cause of 
    the failure is immediately investigated and repaired when a failure
    occurs. Users should select the action mode according to their own
    operational policies. The default is Action Mode 1.

     - Action Mode 1(Compute Node Operation Continuation Mode)
      Operation will continue without taking any special treatment for
      the compute node. If a failure is detected in the health check at
      job start, the job that triggered the health check will be requeued.
      However, in the health check at job end, since the job that trigg-
      ered the health check has already ended, it will not be requeued
      even if a failure is detected.

      If the option to notify the user when a job is requeued(--mail-type)
      is specified when the job is submitted, for the requeued job, a noti-
      fication will be sent according to the user notification settings.

     - Action Mode 2(Compute Node Operation Stop Mode)
      Treat the entire compute node as a failed node, set it to the DOWN
      state, and stop operation. All jobs running on that node will be 
      requeued.

      User notification is the same as Action Mode 1.

    (1-2-1) Health check settings

      The VEs health check is performed by shell script. The sample scr-
      ipt is installed under /etc/slurm/ when installing the 
      slurm-for_ve-22.05.2-1.el7.x86_64.rpm. The sample script name is 
      ve_healthchk_for_job.sh.example. Follow the steps below to set it
      with root privileges.
      1) Rename /etc/slurm/ve_healthchk_for_job.sh.example to 
       /etc/slurm/ve_healthchk_for_job.sh.
      2) Change the file permissions to 755 with the chmod command.
       Example: chmod 755 ve_healthchk_for_job.sh
      3) Set the full path of the health check script to the Prolog and 
       Epilog options in /etc/slurm/slurm.conf.
       Example:Prolog=/etc/slurm/ve_healthchk_for_job.sh
               Epilog=/etc/slurm/ve_healthchk_for_job.sh
      4) Set Alloc to PrologFlags option in /etc/slurm/slurm.conf so that
       the health check script runs when the job is allocated.
      5) Open the health check script file with vi etc., and set the 
       action mode to match your operational policy. The default is 
       Action Mode 1.
        For Action Mode 1: TROUBLESHOOTING_MODE=1
        For Action Mode 2: TROUBLESHOOTING_MODE=2
      6) Run the "scontrol reconfigure" command to apply the settings.

    (1-2-2) Return value of health check shell script

      The health check shell script always exits with exit 0 regardless
      of whether a VE has failed. If a VE failure is detected, the job
      will be requeued and the state of the compute node will be updated
      in the health check script according to the action mode setting,
      but Prolog and Epilog will always succeed.

    (1-2-3) Actions for compute nodes

      - Action Mode 1
       Operation continues without taking any treatment for the failure
       compute node. However, if left unattended, a faulty VE may be ass-
       igned to the job. Therefore, when a failure is detected, delete the
       VE from /etc/slurm/slurm.conf and /etc/slurm/gres.conf.

        Example: When 8VE is installed on compute node vhost1 and VE0 fails
         Reduce the number of VEs in /etc/slurm/slurm.conf.
          Before: NodeName=vhost1 CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10
                  ThreadsPerCore=2 RealMemory=128828 Gres=ve:10b:8,hca:2 State=UNKNOWN
                                                     ^^^^^^^^^^^^^
          After:  NodeName=vhost1 CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 
                  ThreadsPerCore=2 RealMemory=128828 Gres=ve:10b:7,hca:2 State=UNKNOWN
                                                     ^^^^^^^^^^^^^
         Remove the VE from /etc/slurm/gres.conf.
          Before: NodeName=vhost1 Name=ve Type=10b File=/dev/veslot[0-7]
                                                        ^^^^^^^^^^^^^^^^           
          After:  NodeName=vhost1 Name=ve Type=10b File=/dev/veslot[1-7]
                                                        ^^^^^^^^^^^^^^^^
      - Action Mode 2
       The failure compute node is changed to the DOWN state. You can
       check the node state with "scontrol show node" command. The foll-
       owing message will be displayed in the "Reason" column.
        "VEs health check fail for job at <timing>"
        Description: <timing> is prolog_slurmd or epilog_slurmd.

    (1-2-4) Actions for jobs

      - Action Mode 1
       In the case of a health check at job start(Prolog), the job that
       triggered the health check is requeued. The job is assigned to 
       another healthy compute node.

      - Action Mode 2
       All jobs running on the compute node are requeued. The job will
       be assigned to another healthy compute node.

    (1-2-5) User notification

      If you want to notify the user of jobs requeued by health checks
      when a VE failure is detected, set REQUEUE to --mail-type when
      submitting jobs.

    (1-2-6) Log output

      The health check script outputs the following messages to 
      /var/log/messages. The tag name of the message is 
      "ve_healthchk_for_job.sh".
      1) Message when a VE failure is detected
       The VE that has failed and needs to be repaired can be determined
       from this message.
        "[JobId=<jobid>,exit_status=<exitcode>,device=<veno>VE] Failed to check node health at <timing>."

        Description: <jobid> is the ID of the job that triggers the
          health check.
          <exitcode> is a detailed code representing the cause of the
          failure. The detailed values are as follows:
            2 A temporary file for failure detection cannot be created
              at job start check.
            3 VEOS status was not ONLINE.
            4 The temporary file for failure detection was not found at
              job end check.
          <veno> is the VE number of the failed VE.
          <timing> is the timing of the health check. The value is 
          prolog_slurmd or epilog_slurmd.

      2) Message indicating the action mode
       "[JobId=<jobid>,NodeName=<node>,ActionMode=<actionmode>]Take action due to VEs health check failure."

       Description: <jobid> is the ID of the job that triggers the 
         health check.
         <node> is the name of the compute node where the VE node failed.
         <actionmode> is the value of the action mode. It can be 0 or 1.

      3) Message that failed to requeue a job that triggered a health check
       "[JobId=<jobid>]Failed to requeue job due to VEs health check failure."

       Description: <jobid> is the ID of the job that triggers the 
         health check.

      4) Message that failed to update the failed compute node to the DOWN state
       "[NodeName=<node> JobId=<jobid>]Failed to update the state of the compute node to DOWN due to VEs health check failure."

       Description: <node> is the name of the compute node where the VE
         node failed.
         <jobid> is the ID of the job that triggers the health check.

      5) Message that skipped the health check because the command to
       check the status of VEOS cannot be executed
       "[JobId=<jobid>,Timing=<timing>]The command for VE health check does not exist or it does not have execute permission. Skip VEs health check."

       Description: <jobid> is the ID of the job that triggers the
         health check.
         <timing> is the timing of the health check. The value is 
         prolog_slurmd or epilog_slurmd.

      6) Message when health check timing is not Prolog and Epilog
       "The value of check timing is abnormal. (SLURM_SCRIPT_CONTEXT=<slurm_script_context>)"

       Description: <slurm_script_context> is the value of the environ-
         ment variable SLURM_SCRIPT_CONTEXT passed from SLURM. The value 
         will be other than prolog_slurmd or epilog_slurmd.

    (1-2-7) Notes

      1) In the case of Action Mode 1, if the compute node is not treated 
         when a failure is detected, the health check failure of the same
         job will be repeated on the same compute node. If you want to 
         resolve the condition, the system administrator should take one
         of the following actions:
          - Remove the settings of the failed VE from the configuration
           of the compute node.
          - Update the compute node to the DOWN state and remove it from
           operation.
      2) Make sure to set the health check script to both Prolog and 
       Epilog in /etc/slurm/slurm.conf. If only one is set, failure de-
       tection may not be possible.
      3) In the case of Action Mode 2, when a VE failure is detected at
       job start, the compute node is updated to the DOWN state, but the
       job executed on that node may not be executed correctly even if 
       it is requeued. Therefore, delete the job with scancel and submit
       it again.
      4) The exit code of the health check script is always 0 even if a
       VE failure is detected. It is not possible to determine whether a 
       failure has been detected by the exit code of the script.

  (1-3) Periodic Health Checks

    Periodically check for failures in the VEs configured for the compute
    nodes that install the VE. If a failure is detected on one VE, the 
    following two action modes can be selected, and the other configured
    VEs are not checked. 

    Action Mode 1 can be used for operations where you want to continu-
    eusing the healthy VEs and CPU cores of the compute nodes even if 
    some VEs fail. Action Mode 2 can be used in operations where the 
    cause of the failure is immediately investigated and repaired when
    a failure occurs. Users should select the action mode according to
    their own operational policies. The default is Action Mode 1.

     - Action Mode 1(Compute Node Operation Continuation Mode)
      Operation will continue without taking any special treatment for
      the compute node. Nothing is done to jobs running on the compute
      node.

     - Action Mode 2(Compute Node Operation Stop Mode)
      Treat the entire compute node as a failed node, set it to the DOWN
      state, and stop operation. All jobs running on that node will be 
      requeued.

      If the option to notify the user when a job is requeued(--mail-type)
      is specified when the job is submitted, for the requeued job, a 
      notification will be sent according to the user notification settings.

    Only VEs configured in /etc/slurm/gres.conf are checked. If the state
    of a compute node is already DOWN or DRAIN before the health check, 
    the health check is not performed on that node.

    (1-3-1) Health check settings

      The VEs health check is performed by shell script. The sample script
      is installed under /etc/slurm/ when installing the 
      slurm-for_ve-22.05.2-1.el7.x86_64.rpm. The sample script name is 
      ve_healthchk_for_node.sh.example. Follow the steps below to set it 
      with root privileges.
      1) Rename /etc/slurm/ve_healthchk_for_node.sh.example to 
       /etc/slurm/ve_healthchk_for_node.sh.
      2) Change the file permissions to 755 with the chmod command.
       Example: chmod 755 ve_healthchk_for_node.sh
      3) Set the full path of the health check script to the 
       HealthCheckProgram option in /etc/slurm/slurm.conf.
       Example:HealthCheckProgram=/etc/slurm/ve_healthchk_for_node.sh
      4) Set health check interval in seconds to HealthCheckInterval
       option in /etc/slurm/slurm.conf.
       Example: HealthCheckInterval=300
      5) Set HealthCheckNodeState option in /etc/slurm/slurm.conf to the
       state of the compute node that needs to be checked. Normally please
       set ANY to check without omission.
       Example: HealthCheckNodeState=ANY
      6) Open the health check script file with vi etc., and set the action 
       mode to match your operational policy. The default is Action Mode 1.
       For Action Mode 1: TROUBLESHOOTING_MODE=1
       For Action Mode 2: TROUBLESHOOTING_MODE=2
      7) Run the "scontrol reconfigure" command to apply the settings.

    (1-3-2) Return value of health check shell script

      If no VE failure is detected, the script exits with exit 0. If a 
      VE failure is detected, the script exits with exit 1.

    (1-3-3) Actions for compute nodes

      - Action Mode 1
       Operation continues without taking any treatment for the failure
       compute node. However, if left unattended, a faulty VE may be ass-
       igned to the job. Therefore, when a failure is detected, delete the
       VE from /etc/slurm/slurm.conf and /etc/slurm/gres.conf.

        Example: When 8VE is installed on compute node vhost1 and VE0 fails
         Reduce the number of VEs in /etc/slurm/slurm.conf.
          Before: NodeName=vhost1 CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10
                  ThreadsPerCore=2 RealMemory=128828 Gres=ve:10b:8,hca:2 State=UNKNOWN
                                                     ^^^^^^^^^^^^^
          After:  NodeName=vhost1 CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 
                  ThreadsPerCore=2 RealMemory=128828 Gres=ve:10b:7,hca:2 State=UNKNOWN
                                                     ^^^^^^^^^^^^^
         Remove the VE from /etc/slurm/gres.conf.
          Before: NodeName=vhost1 Name=ve Type=10b File=/dev/veslot[0-7]
                                                        ^^^^^^^^^^^^^^^^           
          After:  NodeName=vhost1 Name=ve Type=10b File=/dev/veslot[1-7]
                                                        ^^^^^^^^^^^^^^^^
      - Action Mode 2
       The failure compute node is changed to the DOWN state. You can
       check the node state with "scontrol show node" command. The foll-
       owing message will be displayed in the "Reason" column.
        "Periodic VEs health check failure"

    (1-3-4) Actions for jobs

      - Action Mode 1
       Do nothing for jobs running on the compute node. The system ad-
       ministrator should check the VE number assigned to the job and 
       the VE number where the failure occurred, and requeue the job
       with the scontrol command if necessary.

      - Action Mode 2
       All jobs running on the compute node are requeued. The job will
       be assigned to another healthy compute node.

    (1-3-5) User notification

      If you want to notify the user of jobs requeued by health checks
      when a VE failure is detected, set REQUEUE to --mail-type when
      submitting jobs.

    (1-3-6) Log output

      The health check script outputs the following messages to 
      /var/log/messages. The tag name of the message is 
      "ve_healthchk_for_node.sh".
      1) Message when a VE failure is detected
       The VE that has failed and needs to be repaired can be determined
       from this message.
        "[device=<veno>VE] Failed to check node health periodically.(reason:<reason>)"

        Description: <veno> is the VE number of the failed VE.
          <reason> is the detailed reason for the failure. It will be one
          of the following:
            "Not found veslot file"
            "Not found os_state file"
            "VEOS state is not ONLINE(state=<VEOS state code>)"

      2) Message where VE is not installed, and health check is skipped
       "Not found VEs. Skip VEs health check."

      3) Message indicating the action mode
       "[NodeName=<node>,ActionMode=<actionmode>]Take action due to VEs health check failure."

       Description: <node> is the name of the compute node where the VE
         node failed.
         <actionmode> is the value of the action mode. It can be 0 or 1.

      4) Messages that failed to get state before the state change of 
       the failure compute node
       "[NodeName=<name>]Failed to get the state of the compute node due to VEs health check failure, skip the node state change."

       Description: <node> is the name of the compute node where the VE
         node failed.

      5) Message that failed to change the failed compute node to the 
       DOWN state
       "[NodeName=<node>]Failed to update the state of the compute node to DOWN due to VEs health check failure."

       Description: <node> is the name of the compute node where the VE
         node failed.

      6) Message in which the compute node state is DOWN/DRAIN and the
       health check or node state change is skipped
       "[NodeName=<node>]The state of the compute node is <nodestate>, skip VEs health check."

       Description: <node> is the name of the compute node where the VE
         node failed.
         <nodestate> is the value of the State column displayed by 
         "scontrol show node" command.

      7) Message that failed to get the state of the compute node before
       the health check and skipped the health check
       "[NodeName=<node>Failed to get the state of the compute node, skip VEs health check."

       Description: <node> is the name of the compute node to be checked.

    (1-3-7) Notes

      1) If the health check takes longer than 60 seconds due to some
       reason, SLURM will forcibly terminate the health check script, 
       so even if the VE fails, it may not be detected.

(2) VE Accounting

  (2-1) Overview

    When a user executes a VE job, it is possible to aggregate the foll-
    owing VE accounting information for each job on the compute node after
    the job execution is completed.
     - CPU consumption time on VE nodes
     - Maximum memory consumption on VE nodes
     - Total memory consumption on VE nodes
     - Average memory consumption on VE nodes
     - List of used VE nodes

    Run the aggregation command on the compute node to aggregate the VE
    accounting information. The aggregated information is saved in a file.
    The aggregated VE accounting information can be displayed with the
    same command.

    Before using this feature, it is necessary to enable the output of
    the VE process accounting. For detailed configuration instructions,
    refer to section 4.14 Configuration for Process accounting in the 
    SX-Aurora TSUBASA Installation Guide.

    https://sxauroratsubasa.sakura.ne.jp/documents/guide/pdfs/InstallationGuide_E.pdf

    In addition, since the cgroup-related information of the job is used
    when aggregating VE accounting information, so it is necessary to en-
    able the cgroup function of SLURM.

  (2-2) Aggregation command(ve_acct)

    This command aggregates or displays VE accounting information. The 
    execution format of the command is as follows.

     ve_acct [-h|--help] [-d|--acct-dir <account-file-save-path>] 
             [-l|--log-file <log-file-name>] [-i|--jobids <jobids>]
             [-r|--aggregate-run] [-c|--aggregate-complete]

    The meaning of each option is as follows.
    -h, --help
     Shows this help message and exit.

    -d, --acct-dir <account-file-save-path>
     Specify the directory to save the VE accounting information file
     (file format is json) to be aggregated or displayed(hereafter ref-
     erred to as the "account save directory"). You can specify either
     an absolute path or a relative path. If the specified directory 
     does not exist, it will be created. If this option is not specif-
     ied, the default value for the account save directory is 
     /var/spool/slurm_for_ve/.

    -l, --log-file <log-file-name>
     Specify the file name in which the log when aggregating or display-
     ing VE accounting information is saved(hereafter referred to as the
     "log file name"). You can specify either an absolute path or a re-
     lative path. If the specified file does not exist, it will be creat-
     ed. If it already exists, append the log to the file. If this option
     is not specified, the default log file name is /var/log/veacct.log.

    -i, --jobids <jobids>
     Specify the job ID of the VE accounting information to be displayed.
     You can specify multiple job IDs by separating them with commas (eg
     -i 0,1,2). If this option is not specified, the VE accounting infor-
     mation of all jobs will be displayed. This option cannot be specif-
     ied together with -r, --aggregate-run or -c, --aggreate-complete.

    -r, --aggregate-run
     Collects the job ID of the running job and the session ID of the VE
     process (hereafter referred to as "base information"). This option 
     cannot be specified together with -i, --jobids.

    -c, --aggregate-complete
     Aggregates the VE accounting information of completed jobs. VE acc-
     ounting information are aggregated when a job is completed based on
     collected base information, so it is necessary to collect base infor-
     mation with -r, --aggregate-run before executing the aggregation
     command with this option. If this option is specified together with
     -r, --aggregate-run, both the collection of base information of run-
     ning jobs and the aggregation of VE accounting information of comp-
     leted jobs will be performed. This option cannot be specified together
     with -i, --jobids.

    If the aggregation command is executed without specifying 
    -r, --aggregate-run and -c, --aggregate-complete, the aggregated VE
    accounting information will be displayed.

    (2-2-1) Command placement

      This command can be installed on compute nodes with the following
      package created from slurm-22.05.2.tar bz2.
       RHEL7: slurm-for_ve-22.05.02-1.el7.x86_64.rpm
       RHEL8: slurm-for_ve-22.05.02-1.el8.x86_64.rpm

      The command is installed under /usr/bin/. At the same time as this
      command, the command dump-veacct for referencing the VE process acc-
      ounting file is installed in the same directory.

      Since the cgroup-related information of the job is used to aggregate
      VE accounting information, please enable one of the following sett-
      ings in /etc/slurm/slurm.conf.
       ProctrackType=proctrack/cgroup
       TaskPlugin=task/cgroup
       JobacctGatherType=jobacct_gather/cgroup

      To execute this command, python 3.6 or higher is required.

    (2-2-2) Aggregation settings

      This command performs aggregation on the assumption that 8 VEs is
      installed in the compute node. For non-8VEs, open /usr/bin/ve_acct
      and change the following global variable at the beginning to the
      appropriate value.

       Variable Name : VE_NODE_NUMBER_PER_VH
       Change example: VE_NODE_NUMBER_PER_VH = 16

      Since this command is not a daemon process, one execution aggreg-
      ates or displays one time. By using the cron function of Linux, VE
      accounting information can be aggregated periodically. Regardless
      of whether you use cron or not, you need to run the command with 
      root privileges to collect or aggregate.

      The following are examples of a cron configuration.

       - Base information of running jobs is collected at 1-minute inter-
        vals.
        ---
        # crontab -e
        * * * * * /usr/bin/ve_acct -r
        ---

       - Aggregate VE accounting information for completed jobs at 
        5-minute intervals
        ---
        # crontab -e
        */5 * * * * /usr/bin/ve_acct -c
        ---

       - Aggregate base information of running jobs and VE accounting
        information of completed jobs at 3-minute intervals at the same
        time
        ---
        # crontab -e
        */3 * * * * /usr/bin/ve_acct -r -c
        ---

      To ensure that all base information for running jobs is collected,
      the sampling interval should be less than the elapsed time of the
      shortest job.

      This command refers to the VE process accounting file
      (/var/opt/nec/ve/account/pacct_<veno> ) to aggregate the VE account
      -ing information for the job. If the VE process accounting file is
      moved due to rotation settings, etc., and if there are unaggregated
      VE accounts in the file, aggregation will not be possible. Set the
      /etc/logrotate.d/psacct-ve file, which is the rotation setting of
      the VE process accounting file on the VEOS, accordingly.

    (2-3) Aggregation of VE accounts

      - Summary items
       The VE resources aggregated by this feature and their units are
       as follows.
       ----------------------------------------------------------
       Item                                      Unit
       ----------------------------------------------------------
       CPU consumption time on VE nodes          tick(1tick=10ms)
       Maximum memory consumption on VE nodes    Kbytes
       Total memory consumption on VE nodes      Kbytes*tick
       Average memory consumption on VE nodes    Kbytes
       List of used VE nodes                     -
       ----------------------------------------------------------

      - Calculation method
       CPU consumption time on VE nodes and total memory consumption
       on VE nodes are the sum of the values of each process of the
       job retrieved from the VE process accounting files.

       Maximum memory consumption on VE nodes is the maximum value am-
       ong the total VE maximum memory usage of the processes executed
       on each VE node used by the job.

       Average memory consumption on VE nodes is calculated by dividing 
       total memory consumption on VE nodes by CPU consumption time on 
       VE nodes. If CPU consumption time on VE nodes is 0, average mem-
       ory consumption on VE nodes is the same value as the total memory
       consumption on VE nodes

       List of used VE nodes is a list of the VE numbers used by the job.
       If a VE node is requested but not used, the VE number is not counted.

      - VE account file
       The file that saves the base information of the running job and
       the VE accounting information of the completed job is called the
       VE account file. The detailed information of the file is as foll-
       ows.

        Format: JSON
        Naming: File that saves only base information: .<jobid>.json
         File that aggregated VE accounting information: <jobid>.json
         *<jobid> is the value of the actual job ID.
        Save location: -d, --acct-dir specified: 
          <specified value>/<host name of the compute node>/
         -d, --acct-dir is not specified:
          /var/spool/slurm_for_ve/<host name of the compute node>/
        Contents: If only the base information of the running job is
          collected, the contents of the file will be the job ID and
          the session ID list of the VE process executed in the job.
          If VE accounting information have been aggregated, VE account-
          ing information will be added in addition to the base informa-
          tion.

    (2-4) Display VE accounts

      If the aggregation command is executed without -r, --aggregate-run
      and -c, --aggregate-complete, the VE accounting information of all
      jobs will be displayed. If you specify the -i, --jobids option, 
      the VE accounting information for the specified job are displayed.
      If -d, --acct-dir is specified, VE accounts under the specified 
      account save directory are displayed. If -d, --acct-dir is not 
      specified, VE accounts under the default account save directory
      /var/spool/slurm_for_ve/ are displayed.

      A display example is shown below.
      ---
      $ ve_acct
             JobID VECpuTime(s)  VEMaxMemory(K)   VEMeanMemory(K)  VEKcoreMin(KMin)      VENodeList
      ------------ ------------ --------------- ----------------- ----------------- ---------------
             61682        42.77        21801984       21710550.68       15476004.21             0,1
             61683        42.51        21848064       21763542.39       15419469.79         0,1,2,3
             61684        46.51        21858060       21763502.31       15419469.79 0,1,2,3,4,5,6,7
      ---

      The columns are described below.
      -------------------------------------------------------------------
      JobID            Job ID
      VECpuTime(s)     CPU consumption time on VE nodes in seconds.
                       It is calculated by dividing the aggregated value
                       by 100.
      VEMaxMemory(K)   Maximum memory consumption on VE nodes in Kbytes.
      VEMeanMemory(K)  Average memory consumption on VE nodes in Kbytes.
      VEKcoreMin(KMin) Total memory consumption on VE nodes in Kbytes*min.
                       It is calculated by dividing the aggregated value
                       by 6000(100*60).
      VENodeList       List of used VE nodes.
      -------------------------------------------------------------------

    (2-5) Notes

      1) VE account aggregation of jobs that have already ended before
       executing the aggregation command is not possible.
      2) If the VE process accounting file is set to be rotated, the VE
       accounts may not be aggregated correctly depending on the job end
       timing and rotation timing. Make sure that you configure the VE
       process accounting file appropriately to rotate the process acc-
       ounting file without any running jobs.
      3) If you change the account save directory in the middle, you may
       not be able to aggregate the VE accounts correctly. After confirm-
       ing that all jobs have been completed and the VE accounts has been
       output, change the account save directory.
      4) VE account file".<jobid>.json" is an unfinished file, so when
       performing generation management for VE account files, do not
       execute this file.
      5) VE account files are named by job ID. An existing VE account may
       be overwritten if the job ID wraps around once. In addition, the
       file size of the account save directory increases as the accounts
       are aggregated, which may affect account display and aggregation.
       Therefore, please manage the generation of files under the account
       save directory appropriately.
      6) If the VE account file name fails to be changed after the VE 
       account aggregation, leave it as it is and the aggregation of the
       job will be repeated. Please check the following log in the log
       file and change the file name of the corresponding job to
       <file name after renaming>.
        ---
        error:write_aggregated_jobacct: Failed to rename account file.(jobid:<jobid> file:<file name after renaming>)
        ---
      7) Jobs that fail to run and do not have a start or end time will
       not be aggregated in the VE accounts.
      8) If the log file is not accessible, the log is output to 
       /var/log/messages when the VE accounts is displayed or aggregated.