PZLASMSUB(3) ScaLAPACK routine of NEC Numeric Library Collection PZLASMSUB(3) NAME PZLASMSUB - look for a small subdiagonal element from the bottom of the matrix that it can safely set to zero SYNOPSIS SUBROUTINE PZLASMSUB( A, DESCA, I, L, K, SMLNUM, BUF, LWORK ) INTEGER I, K, L, LWORK DOUBLE PRECISION SMLNUM INTEGER DESCA( * ) COMPLEX*16 A( * ), BUF( * ) PURPOSE PZLASMSUB looks for a small subdiagonal element from the bottom of the matrix that it can safely set to zero. Notes ===== Each global data object is described by an associated description vec- tor. This vector stores the information required to establish the map- ping between an object element and its corresponding process and memory location. Let A be a generic term for any 2D block cyclicly distributed array. Such a global array has an associated description vector DESCA. In the following comments, the character _ should be read as "of the global array". NOTATION STORED IN EXPLANATION --------------- -------------- -------------------------------------- DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, DTYPE_A = 1. CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating the BLACS process grid A is distribu- ted over. The context itself is glo- bal, but the handle (the integer value) may vary. M_A (global) DESCA( M_ ) The number of rows in the global array A. N_A (global) DESCA( N_ ) The number of columns in the global array A. MB_A (global) DESCA( MB_ ) The blocking factor used to distribute the rows of the array. NB_A (global) DESCA( NB_ ) The blocking factor used to distribute the columns of the array. RSRC_A (global) DESCA( RSRC_ ) The process row over which the first row of the array A is distributed. CSRC_A (global) DESCA( CSRC_ ) The process column over which the first column of the array A is distributed. LLD_A (local) DESCA( LLD_ ) The leading dimension of the local array. LLD_A >= MAX(1,LOCr(M_A)). Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has dimension p x q. LOCr( K ) denotes the number of elements of K that a process would receive if K were distributed over the p processes of its process col- umn. Similarly, LOCc( K ) denotes the number of elements of K that a process would receive if K were distributed over the q processes of its process row. The values of LOCr() and LOCc() may be determined via a call to the ScaLAPACK tool function, NUMROC: LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). An upper bound for these quantities may be computed by: LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A ARGUMENTS A (global input) COMPLEX*16 array, dimension (DESCA(LLD_),*) On entry, the Hessenberg matrix whose tridiagonal part is being scanned. Unchanged on exit. DESCA (global and local input) INTEGER array of dimension DLEN_. The array descriptor for the distributed matrix A. I (global input) INTEGER The global location of the bottom of the unreduced submatrix of A. Unchanged on exit. L (global input) INTEGER The global location of the top of the unreduced submatrix of A. Unchanged on exit. K (global output) INTEGER On exit, this yields the bottom portion of the unreduced subma- trix. This will satisfy: L <= M <= I-1. SMLNUM (global input) DOUBLE PRECISION On entry, a "small number" for the given matrix. Unchanged on exit. BUF (local output) COMPLEX*16 array of size LWORK. LWORK (global input) INTEGER On exit, LWORK is the size of the work buffer. This must be at least 2*Ceil( Ceil( (I-L)/HBL ) / LCM(NPROW,NPCOL) ) Here LCM is least common multiple, and NPROWxNPCOL is the logical grid size. Notes: ====== This routine does a global maximum and must be called by all processes. This code is basically a parallelization of the following snip of LAPACK code from ZLAHQR: Look for a single small subdiagonal element. DO 20 K = I, L + 1, -1 TST1 = CABS1( H( K-1, K-1 ) ) + CABS1( H( K, K ) ) IF( TST1.EQ.ZERO ) $ TST1 = ZLANHS( '1', I- L+1, H( L, L ), LDH, WORK ) IF( CABS1( H( K, K-1 ) ).LE.MAX( ULP*TST1, SMLNUM ) ) $ GO TO 30 20 CONTINUE 30 CONTINUE FURTHER DETAILS Implemented by: M. Fahey, May 28, 1999 ScaLAPACK routine 31 October 2017 PZLASMSUB(3)