[ English | Japanese ]

ScaLAPACK and BLACS

Introduction

ScaLAPACK (Scalable Linear Algebra PACKage) is a library of high-performance linear algebra routines for distributed-memory message passing computers. ScaLAPACK has routines for systems of linear equations, linear least squares problems, eigenvalue calculation, and singular value decomposition. ScaLAPACK can also handle many associated computations such as matrix factorization or estimating condition numbers. Dense and band matrices are supported, but not general sparse matrices. Similar functionality is provided for both real and complex matrices.

As in LAPACK, the ScaLAPACK routines are based on block-partitioned algorithms, in order to minimize data movement. The fundamental building block of the ScaLAPACK library is a distributed memory version of the Level 1, 2, and 3 BLAS, called PBLAS (Parallel BLAS). The PBLAS are, in turn, built on the BLAS for computation on a single node, and on BLACS for communication across nodes. PBLAS is an integral part of the ScaLAPACK library.

BLACS (Basic Linear Algebra Communication Subprograms) are a message-passing library designed for linear algebra. The computational model consists of a one- or two-dimensional process grid, where each process stores pieces of the matrices and vectors. The BLACS include synchronous send/receive routines to communicate a matrix or submatrix from one process to another, to broadcast submatrices to many processes, or to compute global data reductions (sums, maxima and minima). There are also routines to construct, change, or query the process grid. Since several ScaLAPACK algorithms require broadcasts or reductions among different subsets of processes, the BLACS permit a process to be a member of several overlapping or disjoint process grids, each one labeled by a context. In MPI this is called a communicator. The BLACS provide facilities for safe inter-operation of system contexts and BLACS contexts.

How to use ScaLAPACK and BLACS

ScaLAPACK Routine List

Simple Driver and Divide and Conquer Driver Subprograms

 ? indicates prefix which must be filled with a combination of:
S = REAL(kind=4), D = REAL(kind=8), C = COMPLEX(kind=4), Z = COMPLEX(kind=8)
Name Prefixes Description
S D C Z Solves a general banded system of linear equations AX=B with no pivoting.
S D C Z Solves a general tridiagonal system of linear equations AX=B with no pivoting.
S D C Z Solves a general banded system of linear equations AX=B.
S D C Z Solves over-determined or under-determined linear systems involving a matrix of full rank.
S D C Z Solves a general system of linear equations AX=B.
S D C Z Computes the singular value decomposition of a general matrix, optionally computing the left and/or right singular vectors.
S D C Z Solves a symmetric/Hermitian positive definite banded system of linear equations AX=B.
S D C Z Solves a symmetric/Hermitian positive definite system of linear equations AX=B.
S D C Z Solves a symmetric/Hermitian positive definite tridiagonal system of linear equations AX=B.
P?SYEV
S D Computes selected eigenvalues and eigenvectors of a symmetric matrix.
P?SYEVD
S D Computes all eigenvalues, and optionally, eigenvectors of a real symmetric matrix. If eigenvectors are desired, it uses a divide and conquer algorithm.
P?HEEV
C Z Computes all eigenvalues and, optionally, eigenvectors of a Hermitian matrix.
P?HEEVD
C Z Computes all eigenvalues and, optionally, eigenvectors of a Hermitian matrix. If eigenvectors are desired, it uses a divide and conquer algorithm.
P?HSEQR
S D Computes the eigenvalues of an upper Hessenberg matrix H and, optionally, the matrices T and Z from the Schur decomposition H = Z*T*ZT, where T is an upper quasi-triangular matrix (the Schur form), and Z is the orthogonal matrix of Schur vectors.

Expert Driver and RRR Driver Subprograms

 ? indicates prefix which must be filled with a combination of:
S = REAL(kind=4), D = REAL(kind=8), C = COMPLEX(kind=4), Z = COMPLEX(kind=8)
Name Prefixes Description
S D C Z Solves a general system of linear equations AX=B.
S D C Z Solves a symmetric/Hermitian positive definite system of linear equations AX=B.
P?SYEVX
S D Computes selected eigenvalues and eigenvectors of a symmetric matrix.
P?SYEVR
S D Computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A distributed in 2D blockcyclic format by calling the recommended sequence of ScaLAPACK routines.
P?SYGVX
S D Computes selected eigenvalues and eigenvectors of a real generalized symmetric-definite eigenproblem.
P?HEEVX
C Z Computes selected eigenvalues and eigenvectors of a Hermitian matrix.
P?HEEVR
C Z Computes selected eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A distributed in 2D blockcyclic format by calling the recommended sequence of ScaLAPACK routines.
P?HEGVX
C Z Computes selected eigenvalues and eigenvectors of a generalized Hermitian-definite eigenproblem.

Computational Subprograms

 ? indicates prefix which must be filled with a combination of:
S = REAL(kind=4), D = REAL(kind=8), C = COMPLEX(kind=4), Z = COMPLEX(kind=8)
Name Prefixes Description
S D C Z Computes an LU factorization of a general band matrix with no pivoting.
S D C Z Solves a general banded system of linear equations AX=B, ATX=B or AHX=B, using the LU factorization computed by P?DBTRF.
S D C Z Solves a banded triangular system of linear equations AX=B, ATX=B or AHX=B, using the LU factorization computed by P?DBTRF.
S D C Z Computes an LU factorization of a general tridiagonal matrix with no pivoting.
S D C Z Solves a general tridiagonal system of linear equations AX=B, ATX=B or AHX=B, using the LU factorization computed by P?DTTRF.
S D C Z Solves a tridiagonal triangular system of linear equations AX=B, ATX=B or AHX=B, using the LU factorization computed by P?DTTRF.
S D C Z Computes an LU factorization of a general band matrix, using partial pivoting with row interchanges.
S D C Z Solves a general banded system of linear equations AX=B, ATX=B or AHX=B, using the LU factorization computed by P?GBTRF.
S D C Z Reduces a general rectangular matrix to real bidiagonal form by an orthogonal/unitary transformation.
S D C Z Estimates the reciprocal of the condition number of a general matrix.
S D C Z Computes row and column scalings to equilibrate a general rectangular matrix and reduce its condition number.
S D C Z Reduces a general matrix to upper Hessenberg form by an orthogonal/unitary similarity transformation.
S D C Z Computes an LQ factorization of a general rectangular matrix.
S D C Z Computes a QL factorization of a general rectangular matrix.
S D C Z Computes a QR factorization with column pivoting of a general rectangular matrix.
S D C Z Computes a QR factorization of a general rectangular matrix.
S D C Z Improves the computed solution to a system of linear equations and provides error bounds and backward error estimates for the solutions.
S D C Z Computes an RQ factorization of a general rectangular matrix.
S D C Z Computes an LU factorization of a general matrix, using partial pivoting with row interchanges.
S D C Z Computes the inverse of a general matrix, using the LU factorization computed by P?GETRF.
S D C Z Solves a general system of linear equations AX=B, ATX=B or AHX=B, using the LU factorization computed by P?GETRF.
S D C Z Computes a generalized QR factorization.
S D C Z Computes a generalized RQ factorization.
S D C Z Computes the Schur decomposition and/or eigenvalues of a matrix already in Hessenberg form.
P?ORGLQ
S D Generates all or part of the orthogonal matrix Q from an LQ factorization determined by PSGELQF.
P?ORGQL
S D Generates all or part of the orthogonal matrix Q from a QL factorization determined by PSGEQLF.
P?ORGQR
S D Generates all or part of the orthogonal matrix Q from a QR factorization determined by PSGEQRF.
P?ORGRQ
S D Generates all or part of the orthogonal matrix Q from an RQ factorization determined by PSGERQF.
P?ORMBR
S D Multiplies a general matrix by one of the orthogonal transformation matrices from a reduction to bidiagonal form determined by PSGEBRD.
P?ORMHR
S D Multiplies a general matrix by the orthogonal transformation matrix from a reduction to Hessenberg form determined by PSGEHRD.
P?ORMLQ
S D Multiplies a general matrix by the orthogonal matrix from an LQ factorization determined by PSGELQF.
P?ORMQL
S D Multiplies a general matrix by the orthogonal matrix from a QL factorization determined by PSGEQLF.
P?ORMQR
S D Multiplies a general matrix by the orthogonal matrix from a QR factorization determined by PSGEQRF.
P?ORMRQ
S D Multiplies a general matrix by the orthogonal matrix from an RQ factorization determined by PSGERQF.
P?ORMRZ
S D Multiplies a general matrix by the orthogonal transformation matrix from a reduction to upper triangular form determined by PSTZRZF.
P?ORMTR
S D Multiplies a general matrix by the orthogonal transformation matrix from a reduction to tridiagonal form determined by PSSYTRD.
S D C Z Computes the Cholesky factorization of a symmetric/Hermitian positive definite banded matrix.
S D C Z Solves a symmetric/Hermitian positive definite banded system of linear equations AX=B, using the Cholesky factorization computed by P?PBTRF.
S D C Z Solves a banded triangular system of linear equations AX=B, using the Cholesky factorization computed by P?PBTRF.
S D C Z Estimates the reciprocal of the condition number of a symmetric/Hermitian positive definite distributed matrix.
S D C Z Computes row and column scalings to equilibrate a symmetric/Hermitian positive definite matrix and reduce its condition number.
S D C Z Improves the computed solution to a symmetric/Hermitian positive definite system of linear equations AX=B, and provides forward and backward error bounds for the solution.
S D C Z Computes the Cholesky factorization of a symmetric/Hermitian positive definite matrix.
S D C Z Computes the inverse of a symmetric/Hermitian positive definite matrix, using the Cholesky factorization computed by P?POTRF.
S D C Z Solves a symmetric/Hermitian positive definite system of linear equations AX=B, using the Cholesky factorization computed by P?POTRF.
S D C Z Computes the Cholesky factorization of a symmetric/Hermitian positive definite tridiagonal matrix.
S D C Z Solves a symmetric/Hermitian positive definite tridiagonal system of linear equations AX=B, using the Cholesky factorization computed by P?PTTRF.
S D C Z Solves a tridiagonal triangular system of linear equations AX=B, using the Cholesky factorization computed by P?PTTRF.
P?STEBZ
S D Computes the eigenvalues of a symmetric/Hermitian tridiagonal matrix by bisection.
P?STEDC
S D Computes all eigenvalues and, optionally, eigenvectors of a symmetric tridiagonal matrix using the divide and conquer algorithm.
S D C Z Computes the eigenvectors of a symmetric/Hermitian tridiagonal matrix using inverse iteration.
P?SYGST
S D Reduces a symmetric-definite generalized eigenproblem to standard form.
P?SYTRD
S D Reduces a symmetric matrix to real symmetric tridiagonal form by an orthogonal similarity transformation.
S D C Z Estimates the reciprocal of the condition number of a triangular matrix.
S D C Z Provides error bounds and backward error estimates for the solution to a system of linear equations with a triangular coefficient matrix.
S D C Z Computes the inverse of a triangular matrix.
S D C Z Solves a triangular system of linear equations AX=B, ATX=B or AHX=B.
S D C Z Reduces an upper trapezoidal matrix to upper triangular form by means of orthogonal transformations.
P?HEGST
C Z Reduces a Hermitian-definite generalized eigenproblem to standard form.
P?HETRD
C Z Reduces a Hermitian matrix to Hermitian tridiagonal form by a unitary similarity transformation.
P?UNGLQ
C Z Generates all or part of the unitary matrix Q from an LQ factorization determined by PCGELQF.
P?UNGQL
C Z Generates all or part of the unitary matrix Q from a QL factorization determined by PCGEQLF.
P?UNGQR
C Z Generates all or part of the unitary matrix Q from a QR factorization determined by PCGEQRF.
P?UNGRQ
C Z Generates all or part of the unitary matrix Q from an RQ factorization determined by PCGERQF.
P?UNMBR
C Z Multiplies a general matrix by one of the unitary transformation matrices from a reduction to bidiagonal form determined by PCGEBRD.
P?UNMHR
C Z Multiplies a general matrix by the unitary transformation matrix from a reduction to Hessenberg form determined by PCGEHRD.
P?UNMLQ
C Z Multiplies a general matrix by the unitary matrix from an LQ factorization determined by PCGELQF.
P?UNMQL
C Z Multiplies a general matrix by the unitary matrix from a QL factorization determined by PCGEQLF.
P?UNMQR
C Z Multiplies a general matrix by the unitary matrix from a QR factorization determined by PCGEQRF.
P?UNMRQ
C Z Multiplies a general matrix by the unitary matrix from an RQ factorization determined by PCGERQF.
P?UNMRZ
C Z Multiplies a general matrix by the unitary transformation matrix from a reduction to upper triangular form determined by PCTZRZF.
P?UNMTR
C Z Multiplies a general matrix by the unitary transformation matrix from a reduction to tridiagonal form determined by PCHETRD.
B?LAAPP
S D Computes B = QT * A or B = A * Q, where A is an M-by-N matrix and Q is an orthogonal matrix represented by the parameters in the arrays ITRAF and DTRAF as described in BDTREXC.
B?LAEXC
S D Swaps adjacent diagonal blocks T11 and T22 of order 1 or 2 in an upper quasi-triangular matrix T by an orthogonal similarity transformation.
B?TREXC
S D Reorders the real Schur factorization of a real matrix A = Q*T*QT, so that the diagonal block of T with row index IFST is moved to row ILST.
?LAQR6
S D Performs a single small-bulge multi-shift QR sweep, moving the chain of bulges from top to bottom in the submatrix H(KTOP:KBOT,KTOP:KBOT), collecting the transformations in the matrix HV *or* accumulating the transformations in the matrix Z (see below).
?LAR1VA
S D Computes the (scaled) r-th column of the inverse of the sumbmatrix in rows B1 through BN of the tridiagonal matrix LDLT - σI.
?LARRB2
S D Does "limited" bisection to refine the eigenvalues of LDLT, W( IFIRST-OFFSET ) through W( ILAST-OFFSET ), to more accuracy.
?LARRD2
S D Computes the eigenvalues of a symmetric tridiagonal matrix T to limited initial accuracy.
?LARRE2
S D To find the desired eigenvalues of a given real symmetric tridiagonal matrix T, DLARRE2 sets, via DLARRA, "small" off-diagonal elements to zero.
?LARRE2A
S D To find the desired eigenvalues of a given real symmetric tridiagonal matrix T, DLARRE2 sets any "small" off-diagonal elements to zero, and for each unreduced block T_i.
?LARRF2
S D Finds a new relatively robust representation LDLT - SIGMA I = L(+) D(+) L(+)T such that at least one of the eigenvalues of L(+) D(+) L(+)T is relatively isolated.
?LARRV2
S D Computes the eigenvectors of the tridiagonal matrix T = LDLT given L, D and APPROXIMATIONS to the eigenvalues of LDLT.
?STEGR2
S D Computes selected eigenvalues and, optionally, eigenvectors of a real symmetric tridiagonal matrix T.
?STEGR2A
S D DSTEGR2A computes selected eigenvalues and initial representations needed for eigenvector computations in DSTEGR2B.
?STEGR2B
S D Computes the selected eigenvalues and eigenvectors of the real symmetric tridiagonal matrix in parallel on multiple processors.
P?GEBAL
S D Balances a general real matrix A.
P?LAMVE
S D Copies all or part of a distributed matrix A to another distributed matrix B.
P?LAQR0
S D Computes the eigenvalues of a Hessenberg matrix H and, optionally, the matrices T and Z from the Schur decomposition H = Z*T*ZT, where T is an upper quasi-triangular matrix (the Schur form), and Z is the orthogonal matrix of Schur vectors.
P?LAQR1
S D Find the Schur decomposition and or eigenvalues of a matrix already in Hessenberg form from cols ILO to IHI.
P?LAQR2
S D Accepts as input an upper Hessenberg matrix A and performs an orthogonal similarity transformation designed to detect and deflate fully converged eigenvalues from a trailing principal submatrix.
P?LAQR3
S D Accepts as input an upper Hessenberg matrix H and performs an orthogonal similarity transformation designed to detect and deflate fully converged eigenvalues from a trailing principal submatrix.
P?LAQR4
S D Find the Schur decomposition and or eigenvalues of a matrix already in Hessenberg form from cols ILO to IHI.
P?LAQR5
S D Performs a single small-bulge multi-shift QR sweep by chasing separated groups of bulges along the main block diagonal of H.
P?ROT
S D Applies a planar rotation defined by CS and SN to the two distributed vectors sub(X) and sub(Y).
P?TRORD
S D Reorders the real Schur factorization of a real matrix A = Q*T*QT, so that a selected cluster of eigenvalues appears in the leading diagonal blocks of the upper quasi-triangular matrix T, and the leading columns of Q form an orthonormal basis of the corresponding right invariant subspace.
P?TRSEN
S D Reorders the real Schur factorization of a real matrix A = Q*T*QT, so that a selected cluster of eigenvalues appears in the leading diagonal blocks of the upper quasi-triangular matrix T, and the leading columns of Q form an orthonormal basis of the corresponding right invariant subspace.
Choose problem-dependent parameters for the local environment.
Return the ScaLAPACK version.
Sets problem and machine dependent parameters useful for PxHSEQR and its subroutines.

Auxiliary Subprograms

 ? indicates prefix which must be filled with a combination of:
S = REAL(kind=4), D = REAL(kind=8), C = COMPLEX(kind=4), Z = COMPLEX(kind=8)
Prefixes Routine
S D C Z
S D CS ZD
S D
?LAPST
?LASORTE
?LASRT2
?STEIN2
P?LABAD
P?LAED0
P?LAED1
P?LAED2
P?LAED3
P?LAEDZ
P?LAMCH
P?LARED1D
P?LARED2D
P?LASRT
P?ORG2L
P?ORG2R
P?ORGL2
P?ORGR2
P?ORM2L
P?ORM2R
P?ORML2
P?ORMR2
P?ORMR3
P?SYGS2
P?SYNGST
P?SYNTRD
P?SYTD2
P?SYTTRD
C Z
?LAHQR2
?LANV2
P?HEGS2
P?HENGST
P?HENTRD
P?HETD2
P?HETTRD
P?LACGV
P?LANHE
P?LARFC
P?LARZC
P?LATTRS
P?MAX1
P?TREVC
P?UNG2L
P?UNG2R
P?UNGL2
P?UNGR2
P?UNM2L
P?UNM2R
P?UNML2
P?UNMR2
P?UNMR3
SC DZ
P?SUM1
n/a

PBLAS Routine List

 ?  indicates prefix which must be filled with a combination of:
S = REAL(kind=4), D = REAL(kind=8), C = COMPLEX(kind=4), Z = COMPLEX(kind=8)
  Name Prefixes Description
Level 1 P?SWAP S D C Z Swap vectors
P?SCAL S D C Z CS ZD Scale vector
P?COPY S D C Z Copy vector
P?AXPY S D C Z Vector scale and add
P?DOT S D Dot product, real
P?DOTU C Z Dot product, complex
P?DOTC C Z Dot product, complex, conjugate first vector
P?NRM2 S D SC DZ Euclidean norm
P?ASUM S D SC DZ Sum absolute values
PI?AMAX S D C Z Index of maximum absolute value
Level 2 P?GEMV S D C Z General matrix-vector multiplication
P?HEMV C Z Hermitian matrix-vector multiplication
P?SYMV S D C Z Symmetric matrix-vector multiplication
P?TRMV S D C Z Triangular matrix-vector multiplication
P?TRSV S D C Z Triangular solve
P?GER S D General rank-1 update, real
P?GERU C Z General rank-1 update, complex
P?GERC C Z General rank-1 update, complex, second vector conjugate
P?HER C Z Hermitian rank-1 update
P?HER2 C Z Hermitian rank-2 update
P?SYR S D Symmetric rank-1 update
P?SYR2 S D Symmetric rank-2 update
Level 3 P?GEMM S D C Z General matrix-matrix multiplication
P?SYMM S D C Z Symmetric matrix-matrix multiplication
P?HEMM C Z Hermitian matrix-matrix multiplication
P?SYRK S D C Z Symmetric rank-k update
P?HERK C Z Hermitian rank-k update
P?SYR2K S D C Z Symmetric rank-2k update
P?HER2K C Z Hermitian rank-2k update
P?TRAN S D Matrix transpose, real
P?TRANU C Z Matrix transpose, complex
P?TRANC C Z Matrix transpose, complex, conjugate
P?TRMM S D C Z Triangular matrix-matrix multiply
P?TRSM S D C Z Triangular solve

BLACS Routine List

 ? indicates prefix which must be filled with a combination of:
s = REAL(kind=4), d = REAL(kind=8), c = COMPLEX(kind=4), z = COMPLEX(kind=8), i = INTEGER
  NamePrefixes Description
Initialization blacs_pinfo Get initial system information that is required before BLACS is set up
blacs_setup Functionally equivalent to blas_pinfo
blacs_get Returns values BLACS is using for internal defaults
blacs_set Sets BLACS internal defaults
blacs_gridinit Assigns processors to BLACS process grid
blacs_gridmap Assigns processors to BLACS process grid in arbitrary manner
Destruction blacs_freebuff Releases BLACS buffer
blacs_gridexit Frees a BLACS context
blacs_abort Aborts all BLACS processes
blacs_exit Frees all BLACS contexts and allocated memory
Sending ?gesd2d s d c z i General send 2-d
?gebs2d s d c z i General broadcast send 2-d
?trsd2d s d c z i Trapezoidal send 2-d
?trbs2d s d c z i Trapezoidal broadcast send 2-d
Receiving ?gerv2d s d c z i General receive
?gebr2d s d c z i General broadcast receive
?trrv2d s d c z i Trapezoidal receive
?trbr2d s d c z i Trapezoidal broadcast receive
Combine ?gamx2d s d c z i General element-wise absolute value maximum
?gamn2d s d c z i General element-wise absolute value minimum
?gsum2d s d c z i General element-wise summation
Information and Miscellaneous blacs_gridinfo Returns information on BLACS grid
blacs_pnum Returns system process number
blacs_pcoord Returns rows and columns in the BLACS process grid
blacs_barrier Holds up execution of all processes till all processes call this routine
Non-Standard dcputime00 Returns CPU seconds since arbitrary starting point
dwalltime00 Returns wall clock seconds since arbitrary starting point
ksendid Returns BLACS message ID
krecvid Returns BLACS message ID for receive
kbsid Returns BLACS message ID for source
kbrid Returns BLACS message ID for destination in broadcast

External Links

  1. ScaLAPACK User's Guide
  2. ScaLAPACK Quick Reference (downloading a PostScript format file)
  3. PBLAS Quick Reference
  4. BLACS Quick Reference
  5. A hardcopy ScaLAPACK User's Guide may be ordered from SIAM.

Version Information

  • This manual page version: 2.2.0-180823