PCLATTRS(3) ScaLAPACK routine of NEC Numeric Library Collection PCLATTRS(3) NAME PCLATTRS - solve one of the triangular systems A * x = s*b, A**T * x = s*b, or A**H * x = s*b, SYNOPSIS SUBROUTINE PCLATTRS( UPLO, TRANS, DIAG, NORMIN, N, A, IA, JA, DESCA, X, IX, JX, DESCX, SCALE, CNORM, INFO ) CHARACTER DIAG, NORMIN, TRANS, UPLO INTEGER IA, INFO, IX, JA, JX, N REAL SCALE INTEGER DESCA( * ), DESCX( * ) REAL CNORM( * ) COMPLEX A( * ), X( * ) PURPOSE PCLATTRS solves one of the triangular systems A * x = s*b, A**T * x = s*b, or A**H * x = s*b, with scaling to prevent overflow. Here A is an upper or lower triangular matrix, A**T denotes the transpose of A, A**H denotes the conjugate transpose of A, x and b are n-element vectors, and s is a scaling factor, usually less than or equal to 1, chosen so that the components of x will be less than the overflow threshold. If the unscaled problem will not cause overflow, the Level 2 PBLAS routine PCTRSV is called. If the matrix A is singular (A(j,j) = 0 for some j) then s is set to 0 and a non-trivial solution to A*x = 0 is returned. This is very slow relative to PCTRSV. This should only be used when scaling is necessary to control overflow, or when it is modified to scale better. Notes ===== Each global data object is described by an associated description vec- tor. This vector stores the information required to establish the map- ping between an object element and its corresponding process and memory location. Let A be a generic term for any 2D block cyclicly distributed array. Such a global array has an associated description vector DESCA. In the following comments, the character _ should be read as "of the global array". NOTATION STORED IN EXPLANATION --------------- -------------- -------------------------------------- DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, DTYPE_A = 1. CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating the BLACS process grid A is distribu- ted over. The context itself is glo- bal, but the handle (the integer value) may vary. M_A (global) DESCA( M_ ) The number of rows in the global array A. N_A (global) DESCA( N_ ) The number of columns in the global array A. MB_A (global) DESCA( MB_ ) The blocking factor used to distribute the rows of the array. NB_A (global) DESCA( NB_ ) The blocking factor used to distribute the columns of the array. RSRC_A (global) DESCA( RSRC_ ) The process row over which the first row of the array A is distributed. CSRC_A (global) DESCA( CSRC_ ) The process column over which the first column of the array A is distributed. LLD_A (local) DESCA( LLD_ ) The leading dimension of the local array. LLD_A >= MAX(1,LOCr(M_A)). Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has dimension r x c. LOCr( K ) denotes the number of elements of K that a process would receive if K were distributed over the r processes of its process col- umn. Similarly, LOCc( K ) denotes the number of elements of K that a process would receive if K were distributed over the c processes of its process row. The values of LOCr() and LOCc() may be determined via a call to the ScaLAPACK tool function, NUMROC: LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). An upper bound for these quantities may be computed by: LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A ARGUMENTS UPLO (global input) CHARACTER*1 Specifies whether the matrix A is upper or lower triangular. = 'U': Upper triangular = 'L': Lower triangular TRANS (global input) CHARACTER*1 Specifies the operation applied to A. = 'N': Solve A * x = s*b (No transpose) = 'T': Solve A**T * x = s*b (Transpose) = 'C': Solve A**H * x = s*b (Conjugate transpose) DIAG (global input) CHARACTER*1 Specifies whether or not the matrix A is unit triangular. = 'N': Non-unit triangular = 'U': Unit triangular NORMIN (global input) CHARACTER*1 Specifies whether CNORM has been set or not. = 'Y': CNORM contains the column norms on entry = 'N': CNORM is not set on entry. On exit, the norms will be computed and stored in CNORM. N (global input) INTEGER The order of the matrix A. N >= 0. A (local input) COMPLEX array, dimension (DESCA(LLD_),*) The triangular matrix A. If UPLO = 'U', the leading n by n upper triangular part of the array A contains the upper trian- gular matrix, and the strictly lower triangular part of A is not referenced. If UPLO = 'L', the leading n by n lower trian- gular part of the array A contains the lower triangular matrix, and the strictly upper triangular part of A is not referenced. If DIAG = 'U', the diagonal elements of A are also not refer- enced and are assumed to be 1. IA (global input) pointer to INTEGER The global row index of the submatrix of the distributed matrix A to operate on. JA (global input) pointer to INTEGER The global column index of the submatrix of the distributed matrix A to operate on. DESCA (global and local input) INTEGER array of dimension DLEN_. The array descriptor for the distributed matrix A. X (local input/output) COMPLEX array, dimension (DESCX(LLD_),*) On entry, the right hand side b of the triangular system. On exit, X is overwritten by the solu- tion vector x. IX (global input) pointer to INTEGER The global row index of the submatrix of the distributed matrix X to operate on. JX (global input) pointer to INTEGER The global column index of the submatrix of the distributed matrix X to operate on. DESCX (global and local input) INTEGER array of dimension DLEN_. The array descriptor for the distributed matrix X. SCALE (global output) REAL The scaling factor s for the triangular system A * x = s*b, A**T * x = s*b, or A**H * x = s*b. If SCALE = 0, the matrix A is singular or badly scaled, and the vector x is an exact or approximate solution to A*x = 0. CNORM (global input or global output) REAL array, dimension (N) If NORMIN = 'Y', CNORM is an input argument and CNORM(j) contains the norm of the off-diagonal part of the j-th column of A. If TRANS = 'N', CNORM(j) must be greater than or equal to the infinity-norm, and if TRANS = 'T' or 'C', CNORM(j) must be greater than or equal to the 1-norm. If NORMIN = 'N', CNORM is an output argument and CNORM(j) returns the 1-norm of the offdiagonal part of the j-th column of A. INFO (global output) INTEGER = 0: successful exit < 0: if INFO = -k, the k-th argument had an illegal value FURTHER DETAILS A rough bound on x is computed; if that is less than overflow, PCTRSV is called, otherwise, specific code is used which checks for possible overflow or divide-by-zero at every operation. A columnwise scheme is used for solving A*x = b. The basic algorithm if A is lower triangular is x[1:n] := b[1:n] for j = 1, ..., n x(j) := x(j) / A(j,j) x[j+1:n] := x[j+1:n] - x(j) * A[j+1:n,j] end Define bounds on the components of x after j iterations of the loop: M(j) = bound on x[1:j] G(j) = bound on x[j+1:n] Initially, let M(0) = 0 and G(0) = max{x(i), i=1,...,n}. Then for iteration j+1 we have M(j+1) <= G(j) / | A(j+1,j+1) | G(j+1) <= G(j) + M(j+1) * | A[j+2:n,j+1] | <= G(j) ( 1 + CNORM(j+1) / | A(j+1,j+1) | ) where CNORM(j+1) is greater than or equal to the infinity-norm of col- umn j+1 of A, not counting the diagonal. Hence G(j) <= G(0) product ( 1 + CNORM(i) / | A(i,i) | ) 1<=i<=j and |x(j)| <= ( G(0) / |A(j,j)| ) product ( 1 + CNORM(i) / |A(i,i)| ) 1<=i< j Since |x(j)| <= M(j), we use the Level 2 PBLAS routine PCTRSV if the reciprocal of the largest M(j), j=1,..,n, is larger than max(underflow, 1/overflow). The bound on x(j) is also used to determine when a step in the column- wise method can be performed without fear of overflow. If the computed bound is greater than a large constant, x is scaled to prevent over- flow, but if the bound overflows, x is set to 0, x(j) to 1, and scale to 0, and a non-trivial solution to A*x = 0 is found. Similarly, a row-wise scheme is used to solve A**T *x = b or A**H *x = b. The basic algorithm for A upper triangular is for j = 1, ..., n x(j) := ( b(j) - A[1:j-1,j]' * x[1:j-1] ) / A(j,j) end We simultaneously compute two bounds G(j) = bound on ( b(i) - A[1:i-1,i]' * x[1:i-1] ), 1<=i<=j M(j) = bound on x(i), 1<=i<=j The initial values are G(0) = 0, M(0) = max{b(i), i=1,..,n}, and we add the constraint G(j) >= G(j-1) and M(j) >= M(j-1) for j >= 1. Then the bound on x(j) is M(j) <= M(j-1) * ( 1 + CNORM(j) ) / | A(j,j) | <= M(0) * product ( ( 1 + CNORM(i) ) / |A(i,i)| ) 1<=i<=j and we can safely call PCTRSV if 1/M(n) and 1/G(n) are both greater than max(underflow, 1/overflow). Last modified by: Mark R. Fahey, August 2000 ScaLAPACK routine 31 October 2017 PCLATTRS(3)