PCSTEIN(3)    ScaLAPACK routine of NEC Numeric Library Collection   PCSTEIN(3)



NAME
       PCSTEIN - compute the eigenvectors of a symmetric tridiagonal matrix in
       parallel, using inverse iteration

SYNOPSIS
       SUBROUTINE PCSTEIN( N, D, E, M, W, IBLOCK, ISPLIT, ORFAC,  Z,  IZ,  JZ,
                           DESCZ,  WORK, LWORK, IWORK, LIWORK, IFAIL, ICLUSTR,
                           GAP, INFO )

           INTEGER         INFO, IZ, JZ, LIWORK, LWORK, M, N

           REAL            ORFAC

           INTEGER         DESCZ( * ), IBLOCK( * ), ICLUSTR( * ), IFAIL( *  ),
                           ISPLIT( * ), IWORK( * )

           REAL            D( * ), E( * ), GAP( * ), W( * ), WORK( * )

           COMPLEX         Z( * )

PURPOSE
       PCSTEIN  computes the eigenvectors of a symmetric tridiagonal matrix in
       parallel, using inverse iteration. The eigenvectors found correspond to
       user specified eigenvalues. PCSTEIN does not orthogonalize vectors that
       are on different processes. The extent  of  orthogonalization  is  con-
       trolled  by  the  input  parameter  LWORK.  Eigenvectors that are to be
       orthogonalized are computed by the same process. PCSTEIN decides on the
       allocation of work among the processes and then calls SSTEIN2 (modified
       LAPACK routine) on each individual process. If  insufficient  workspace
       is allocated, the expected orthogonalization may not be done.

       Note : If the eigenvectors obtained are not orthogonal, increase
              LWORK and run the code again.


       Notes
       =====


       Each  global data object is described by an associated description vec-
       tor.  This vector stores the information required to establish the map-
       ping between an object element and its corresponding process and memory
       location.

       Let A be a generic term for any 2D block  cyclicly  distributed  array.
       Such a global array has an associated description vector DESCA.  In the
       following comments, the character _ should be read as  "of  the  global
       array".

       NOTATION        STORED IN      EXPLANATION
       --------------- -------------- --------------------------------------
       DTYPE_A(global) DESCA( DTYPE_ )The descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row  of  the  array  A  is  distributed.
       CSRC_A (global) DESCA( CSRC_ ) The process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let K be the number of rows or columns of  a  distributed  matrix,  and
       assume that its process grid has dimension r x c.
       LOCr(  K  )  denotes  the  number of elements of K that a process would
       receive if K were distributed over the r processes of its process  col-
       umn.
       Similarly, LOCc( K ) denotes the number of elements of K that a process
       would receive if K were distributed over the c processes of its process
       row.
       The  values  of  LOCr()  and LOCc() may be determined via a call to the
       ScaLAPACK tool function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An  upper
       bound for these quantities may be computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A



ARGUMENTS
       P = NPROW * NPCOL is the total number of processes

       N       (global input) INTEGER
               The order of the tridiagonal matrix T.  N >= 0.

       D       (global input) REAL array, dimension (N)
               The n diagonal elements of the tridiagonal matrix T.

       E       (global input) REAL array, dimension (N-1)
               The (n-1) off-diagonal elements of the tridiagonal matrix T.

       M       (global input) INTEGER
               The total number of eigenvectors to be found. 0 <= M <= N.

       W       (global input/global output) REAL array, dim (M)
               On input, the first M elements of W contain all the eigenvalues
               for which eigenvectors are  to  be  computed.  The  eigenvalues
               should  be grouped by split-off block and ordered from smallest
               to largest within the block (The output array  W  from  PSSTEBZ
               with  ORDER='b'  is expected here). This array should be repli-
               cated on all processes.  On output, the first M  elements  con-
               tain the input eigenvalues in ascending order.

               Note  : To obtain orthogonal vectors, it is best if eigenvalues
               are computed to highest accuracy ( this can be done by  setting
               ABSTOL  to  the underflow threshold = SLAMCH('U') --- ABSTOL is
               an input parameter to PSSTEBZ )

       IBLOCK  (global input) INTEGER array, dimension (N)
               The submatrix indices associated with the corresponding  eigen-
               values  in W -- 1 for eigenvalues belonging to the first subma-
               trix from the top, 2 for those belonging to the  second  subma-
               trix,  etc.  (The  output array IBLOCK from PSSTEBZ is expected
               here).

       ISPLIT  (global input) INTEGER array, dimension (N)
               The splitting points, at which T breaks  up  into  submatrices.
               The  first  submatrix  consists of rows/columns 1 to ISPLIT(1),
               the second of rows/columns ISPLIT(1)+1 through ISPLIT(2), etc.,
               and  the  NSPLIT-th consists of rows/columns ISPLIT(NSPLIT-1)+1
               through ISPLIT(NSPLIT)=N (The output array ISPLIT from  PSSTEBZ
               is expected here.)

       ORFAC   (global input) REAL
               ORFAC  specifies  which  eigenvectors should be orthogonalized.
               Eigenvectors that correspond to eigenvalues  which  are  within
               ORFAC*||T||  of  each other are to be orthogonalized.  However,
               if the workspace is insufficient (see  LWORK),  this  tolerance
               may  be  decreased  until all eigenvectors to be orthogonalized
               can be stored in one process.   No  orthogonalization  will  be
               done if ORFAC equals zero.  A default value of 10^-3 is used if
               ORFAC is negative.  ORFAC should be identical on all processes.

       Z       (local output) COMPLEX array,
               dimension  (DESCZ(DLEN_), N/npcol + NB) Z contains the computed
               eigenvectors associated with  the  specified  eigenvalues.  Any
               vector  which  fails  to converge is set to its current iterate
               after MAXITS iterations ( See SSTEIN2 ).  On output, Z is  dis-
               tributed across the P processes in block cyclic format.

       IZ      (global input) INTEGER
               Z's global row index, which points to the beginning of the sub-
               matrix which is to be operated on.

       JZ      (global input) INTEGER
               Z's global column index, which points to the beginning  of  the
               submatrix which is to be operated on.

       DESCZ   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix Z.

       WORK    (local workspace/global output) REAL array,
               dimension  (  LWORK ) On output, WORK(1) gives a lower bound on
               the workspace (  LWORK  )  that  guarantees  the  user  desired
               orthogonalization (see ORFAC).  Note that this may overestimate
               the minimum workspace needed.

       LWORK   (local input) integer
               LWORK  controls the extent of orthogonalization  which  can  be
               done. The number of eigenvectors for which storage is allocated
               on each process is NVEC  =  floor((  LWORK-  max(5*N,NP00*MQ00)
               )/N).   Eigenvectors  corresponding  to  eigenvalue clusters of
               size NVEC - ceil(M/P) + 1 are guaranteed to be orthogonal ( the
               orthogonality  is similar to that obtained from CSTEIN2).  Note
               :  LWORK   must  be  no  smaller  than:  max(5*N,NP00*MQ00)   +
               ceil(M/P)*N,  and  should have the same input value on all pro-
               cesses.  It is the minimum value of LWORK  input  on  different
               processes that is significant.

               If LWORK = -1, then LWORK is global input and a workspace query
               is assumed; the routine only calculates the minimum and optimal
               size  for  all work arrays. Each of these values is returned in
               the first entry of the corresponding work array, and  no  error
               message is issued by PXERBLA.

       IWORK   (local workspace/global output) INTEGER array,
               dimension  (  3*N+P+1 ) On return, IWORK(1) contains the amount
               of integer workspace required.  On return, the IWORK(2) through
               IWORK(P+2)  indicate the eigenvectors computed by each process.
               Process I  computes  eigenvectors  indexed  IWORK(I+2)+1  thru'
               IWORK(I+3).

       LIWORK  (local input) INTEGER
               Size of array IWORK.  Must be >= 3*N + P + 1

               If  LIWORK  =  -1,  then LIWORK is global input and a workspace
               query is assumed; the routine only calculates the  minimum  and
               optimal  size  for  all  work  arrays.  Each of these values is
               returned in the first entry of the  corresponding  work  array,
               and no error message is issued by PXERBLA.

       IFAIL   (global output) integer array, dimension (M)
               On normal exit, all elements of IFAIL are zero.  If one or more
               eigenvectors fail to converge after MAXITS  iterations  (as  in
               CSTEIN),  then  INFO > 0 is returned.  If mod(INFO,M+1)>0, then
               for I=1 to mod(INFO,M+1), the eigenvector corresponding to  the
               eigenvalue  W(IFAIL(I))  failed  to  converge ( W refers to the
               array of eigenvalues on output ).

               ICLUSTR (global output) integer  array,  dimension  (2*P)  This
               output  array contains indices of eigenvectors corresponding to
               a cluster of eigenvalues that could not be  orthogonalized  due
               to  insufficient  workspace (see LWORK, ORFAC and INFO). Eigen-
               vectors  corresponding  to  clusters  of  eigenvalues   indexed
               ICLUSTR(2*I-1)  to ICLUSTR(2*I), I = 1 to INFO/(M+1), could not
               be orthogonalized due to lack of workspace. Hence the eigenvec-
               tors  corresponding  to  these  clusters may not be orthogonal.
               ICLUSTR is a zero  terminated  array  ---  (  ICLUSTR(2*K).NE.0
               .AND.  ICLUSTR(2*K+1).EQ.0  ) if and only if K is the number of
               clusters.

       GAP     (global output) REAL array, dimension (P)
               This output array contains the gap  between  eigenvalues  whose
               eigenvectors  could  not  be  orthogonalized. The INFO/M output
               values in this array  correspond  to  the  INFO/(M+1)  clusters
               indicated  by  the  array ICLUSTR. As a result, the dot product
               between eigenvectors corresponding to the I^th cluster  may  be
               as high as ( O(n)*macheps ) / GAP(I).

       INFO    (global output) INTEGER
               = 0:  successful exit
               <  0:   If the i-th argument is an array and the j-entry had an
               illegal value, then INFO = -(i*100+j), if the i-th argument  is
               a  scalar  and had an illegal value, then INFO = -i.  < 0 :  if
               INFO = -I, the I-th argument had an illegal value
               > 0 :  if mod(INFO,M+1) = I, then I eigenvectors failed to con-
               verge  in  MAXITS  iterations.  Their indices are stored in the
               array IFAIL.  if INFO/(M+1) = I, then eigenvectors  correspond-
               ing  to  I  clusters of eigenvalues could not be orthogonalized
               due to insufficient workspace. The indices of the clusters  are
               stored in the array ICLUSTR.



ScaLAPACK routine               31 October 2017                     PCSTEIN(3)