PDLACONSB(3)  ScaLAPACK routine of NEC Numeric Library Collection PDLACONSB(3)



NAME
       PDLACONSB - look for two consecutive small subdiagonal elements by see-
       ing the effect of starting a double shift QR iteration  given  by  H44,
       H33, & H43H34 and see if this would make a subdiagonal negligible

SYNOPSIS
       SUBROUTINE PDLACONSB( A, DESCA, I, L, M, H44, H33, H43H34, BUF, LWORK )

           INTEGER           I, L, LWORK, M

           DOUBLE            PRECISION H33, H43H34, H44

           INTEGER           DESCA( * )

           DOUBLE            PRECISION A( * ), BUF( * )

PURPOSE
       PDLACONSB looks for two consecutive small subdiagonal elements by  see-
       ing  the  effect  of starting a double shift QR iteration given by H44,
       H33, & H43H34 and see if this would make a subdiagonal negligible.

       Notes
       =====

       Each global data object is described by an associated description  vec-
       tor.  This vector stores the information required to establish the map-
       ping between an object element and its corresponding process and memory
       location.

       Let  A  be  a generic term for any 2D block cyclicly distributed array.
       Such a global array has an associated description vector DESCA.  In the
       following  comments,  the  character _ should be read as "of the global
       array".

       NOTATION        STORED IN      EXPLANATION
       --------------- -------------- --------------------------------------
       DTYPE_A(global) DESCA( DTYPE_ )The descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row  of  the  array  A  is  distributed.
       CSRC_A (global) DESCA( CSRC_ ) The process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let  K  be  the  number of rows or columns of a distributed matrix, and
       assume that its process grid has dimension p x q.
       LOCr( K ) denotes the number of elements of  K  that  a  process  would
       receive  if K were distributed over the p processes of its process col-
       umn.
       Similarly, LOCc( K ) denotes the number of elements of K that a process
       would receive if K were distributed over the q processes of its process
       row.
       The values of LOCr() and LOCc() may be determined via  a  call  to  the
       ScaLAPACK tool function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc(  N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An upper
       bound for these quantities may be computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A


ARGUMENTS
       A       (global input) DOUBLE PRECISION array, dimension
               (DESCA(LLD_),*) On entry, the Hessenberg matrix whose tridiago-
               nal part is being scanned.  Unchanged on exit.

       DESCA   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix A.

       I       (global input) INTEGER
               The global location of the bottom of the unreduced submatrix of
               A.  Unchanged on exit.

       L       (global input) INTEGER
               The global location of the top of the unreduced submatrix of A.
               Unchanged on exit.

       M       (global output) INTEGER
               On  exit,  this  yields  the starting location of the QR double
               shift.  This will satisfy: L <= M  <= I-2.

               H44 H33 H43H34  (global input)  DOUBLE  PRECISION  These  three
               values are for the double shift QR iteration.

       BUF     (local output) DOUBLE PRECISION array of size LWORK.

       LWORK   (global input) INTEGER
               On exit, LWORK is the size of the work buffer.  This must be at
               least 7*Ceil( Ceil( (I-L)/HBL ) / LCM(NPROW,NPCOL) )  Here  LCM
               is  least  common multiple, and NPROWxNPCOL is the logical grid
               size.

               Logic:
               ======

               Two consecutive small subdiagonal elements will  stall  conver-
               gence  of  a  double shift if their product is small relatively
               even if each is not very small.  Thus it is necessary  to  scan
               the  "tridiagonal  portion of the matrix."  In the LAPACK algo-
               rithm DLAHQR, a loop of M goes from I-2 down to L and  examines
               H(m,m),H(m+1,m+1),H(m+1,m),H(m,m+1),H(m-1,m-1),H(m,m-1),    and
               H(m+2,m-1).  Since these elements may be  on  separate  proces-
               sors,  the  first major loop (10) goes over the tridiagonal and
               has each node store whatever values of the 7 it  has  that  the
               node  owning  H(m,m) does not.  This will occur on a border and
               can happen in no more  than  3  locations  per  block  assuming
               square blocks.  There are 5 buffers that each node stores these
               values:  a buffer to send diagonally down and right,  a  buffer
               to  send up, a buffer to send left, a buffer to send diagonally
               up and left and a buffer to send right.  Each of these  buffers
               is  actually stored in one buffer BUF where BUF(ISTR1+1) starts
               the first buffer, BUF(ISTR2+1) starts the second, etc..   After
               the  values  are  stored,  if  there are any values that a node
               needs, they will be sent and received.   Then  the  next  major
               loop  passes  over  the  data  and searches for two consecutive
               small subdiagonals.

               Notes:
               ======

               This routine does a global maximum and must be  called  by  all
               processes.

               Implemented by:  G. Henry, November 17, 1996



ScaLAPACK routine               31 October 2017                   PDLACONSB(3)