PCGESVD(3) ScaLAPACK routine of NEC Numeric Library Collection PCGESVD(3)
NAME
PCGESVD - computes the singular value decomposition (SVD) of an M-by-N
matrix A, optionally computing the left and/or right singular vectors
SYNOPSIS
SUBROUTINE PCGESVD( JOBU, JOBVT, M, N, A, IA, JA, DESCA, S, U, IU, JU,
DESCU, VT, IVT, JVT, DESCVT, WORK, LWORK, RWORK,
INFO)
CHARACTER JOBU, JOBVT
INTEGER IA, INFO, IU, IVT, JA, JU, JVT, LWORK, M, N
INTEGER DESCA(*), DESCU(*), DESCVT(*)
COMPLEX A(*), U(*), VT(*), WORK(*)
REAL S(*)
REAL RWORK(*)
PURPOSE
PCGESVD computes the singular value decomposition (SVD) of an M-by-N
matrix A, optionally computing the left and/or right singular vectors.
The SVD is written as
A = U * SIGMA * transpose(V)
where SIGMA is an M-by-N matrix which is zero except for its min(M,N)
diagonal elements, U is an M-by-M orthogonal matrix, and V is an N-by-N
orthogonal matrix. The diagonal elements of SIGMA are the singular val-
ues of A and the columns of U and V are the corresponding right and
left singular vectors, respectively. The singular values are returned
in array S in decreasing order and only the first min(M,N) columns of U
and rows of VT = V**T are computed.
Notes
=====
Each global data object is described by an associated description vec-
tor. This vector stores the information required to establish the map-
ping between an object element and its corresponding process and memory
location.
Let A be a generic term for any 2D block cyclicly distributed array.
Such a global array has an associated description vector DESCA.
In the following comments, the character _ should be read as "of the
global array".
NOTATION STORED IN EXPLANATION
--------------- -------------- --------------------------------------
DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case,
DTYPE_A = 1.
CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
the BLACS process grid A is distribu-
ted over. The context itself is glo-
bal, but the handle (the integer
value) may vary.
M_A (global) DESCA( M_ ) The number of rows in the global
array A.
N_A (global) DESCA( N_ ) The number of columns in the global
array A.
MB_A (global) DESCA( MB_ ) The blocking factor used to distribute
the rows of the array.
NB_A (global) DESCA( NB_ ) The blocking factor used to distribute
the columns of the array.
RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
row of the array A is distributed.
CSRC_A (global) DESCA( CSRC_ ) The process column over which the
first column of the array A is
distributed.
LLD_A (local) DESCA( LLD_ ) The leading dimension of the local
array. LLD_A >= MAX(1,LOCr(M_A)).
Let K be the number of rows or columns of a distributed matrix, and
assume that its process grid has dimension r x c. LOCr( K ) denotes the
number of elements of K that a process would receive if K were dis-
tributed over the r processes of its process column. Similarly, LOCc( K
) denotes the number of elements of K that a process would receive if K
were distributed over the c processes of its process row. The values of
LOCr() and LOCc() may be determined via a call to the ScaLAPACK tool
function, NUMROC:
LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
An upper bound for these quantities may be computed by:
LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
ARGUMENTS
MP = number of local rows in A and U
NQ = number of local columns in A and VT
SIZE = min( M, N )
SIZEQ = number of local columns in U
SIZEP = number of local rows in VT
JOBU (global input) CHARACTER*1
Specifies options for computing U:
= 'V': the first SIZE columns of U (the left singular
vectors) are returned in the array U;
= 'N': no columns of U (no left singular vectors) are
computed.
JOBVT (global input) CHARACTER*1
Specifies options for computing V**T:
= 'V': the first SIZE rows of V**T (the right singular
vectors) are returned in the array VT;
= 'N': no rows of V**T (no right singular vectors) are
computed.
M (global input) INTEGER
The number of rows of the input matrix A. M >= 0.
N (global input) INTEGER
The number of columns of the input matrix A. N >= 0.
A (local input/workspace) block cyclic COMPLEX
array,
global dimension (M, N), local dimension (MP, NQ)
On exit, the contents of A are destroyed.
IA (global input) INTEGER
The row index in the global array A indicating the first row of
sub( A ).
JA (global input) INTEGER
The column index in the global array A indicating the first
column of sub( A ).
DESCA (global input) INTEGER array of dimension DLEN_
The array descriptor for the distributed matrix A.
S (global output) REAL array, dimension SIZE
The singular values of A, sorted so that S(i) >= S(i+1).
U (local output) COMPLEX array, local dimension
(MP, SIZEQ), global dimension (M, SIZE)
if JOBU = 'V', U contains the first min(m,n) columns of U
if JOBU = 'N', U is not referenced.
IU (global input) INTEGER
The row index in the global array U indicating the first row of
sub( U ).
JU (global input) INTEGER
The column index in the global array U indicating the first
column of sub( U ).
DESCU (global input) INTEGER array of dimension DLEN_
The array descriptor for the distributed matrix U.
VT (local output) COMPLEX array, local dimension
(SIZEP, NQ), global dimension (SIZE, N).
If JOBVT = 'V', VT contains the first SIZE rows of V**T. If
JOBVT = 'N', VT is not referenced.
IVT (global input) INTEGER
The row index in the global array VT indicating the first row
of sub( VT ).
JVT (global input) INTEGER
The column index in the global array VT indicating the first
column of sub( VT ).
DESCVT (global input) INTEGER array of dimension DLEN_
The array descriptor for the distributed matrix VT.
WORK (local workspace/output) COMPLEX array, dimension
(LWORK)
On exit, if INFO = 0, WORK(1) returns the optimal LWORK.
LWORK (local input) INTEGER
The dimension of the array WORK.
LWORK >= 1 + 2*SIZEB + MAX(WATOBD, WBDTOSVD),
where SIZEB = MAX(M,N), and WATOBD and WBDTOSVD refer, respec-
tively, to the workspace required to bidiagonalize the matrix A
and to go from the bidiagonal matrix to the singular value
decomposition U*S*VT.
For WATOBD, the following holds:
WATOBD = MAX(MAX(WPCLANGE,WPCGEBRD),
MAX(WPCLARED2D,WP(pre)LARED1D)),
where WPCLANGE, WPCLARED1D, WPCLARED2D, WPCGEBRD are the
workspaces required respectively for the subprograms PCLANGE,
PSLARED1D, PSLARED2D, PCGEBRD. Using the standard notation
MP = NUMROC( M, MB, MYROW, DESCA( CTXT_ ), NPROW),
NQ = NUMROC( N, NB, MYCOL, DESCA( LLD_ ), NPCOL),
the workspaces required for the above subprograms are
WPCLANGE = MP,
WPSLARED1D = NQ0,
WPSLARED2D = MP0,
WPCGEBRD = NB*(MP + NQ + 1) + NQ,
where NQ0 and MP0 refer, respectively, to the values obtained
at MYCOL = 0 and MYROW = 0. In general, the upper limit for the
workspace is given by a workspace required on processor (0,0):
WATOBD <= NB*(MP0 + NQ0 + 1) + NQ0.
In case of a homogeneous process grid this upper limit can be
used as an estimate of the minimum workspace for every proces-
sor.
For WBDTOSVD, the following holds:
WBDTOSVD = SIZE*(WANTU*NRU + WANTVT*NCVT) +
MAX(WCBDSQR,
MAX(WANTU*WPCORMBRQLN, WANTVT*WPCORMBRPRT)),
where
1, if left(right) singular vectors are wanted
WANTU(WANTVT) =
0, otherwise
and WCBDSQR, WPCORMBRQLN and WPCORMBRPRT refer respectively to
the workspace required for the subprograms CBDSQR, PCUN-
MBR(QLN), and PCUNMBR(PRT), where QLN and PRT are the values of
the arguments VECT, SIDE, and TRANS in the call to PCUNMBR. NRU
is equal to the local number of rows of the matrix U when dis-
tributed 1-dimensional "column" of processes. Analogously, NCVT
is equal to the local number of columns of the matrix VT when
distributed across 1-dimensional "row" of processes. Calling
the LAPACK procedure CBDSQR requires
WCBDSQR = MAX(1, 4*SIZE )
on every processor. Finally,
WPCORMBRQLN = MAX( (NB*(NB-1))/2, (SIZEQ+MP)*NB)+NB*NB,
WPCORMBRPRT = MAX( (MB*(MB-1))/2, (SIZEP+NQ)*MB )+MB*MB,
If LWORK = -1, then LWORK is global input and a workspace query
is assumed; the routine only calculates the minimum size for
the work array. The required workspace is returned as the first
element of WORK and no error message is issued by PXERBLA.
RWORK (workspace) REAL array, dimension (1+4*SIZEB)
On exit, if INFO = 0, RWORK(1) returns the necessary size for
RWORK.
INFO (output) INTEGER
= 0: successful exit
< 0: if INFO = -i, the i-th argument had an illegal value
> 0: if CBDSQR did not converge
If INFO = MIN(M,N) + 1, then PCGESVD has detected
heterogeneity by finding that eigenvalues were not
identical across the process grid. In this case, the
accuracy of the results from PCGESVD cannot be
guaranteed.
The results of PCGEBRD, and therefore PCGESVD, may vary slightly from
run to run with the same input data. If repeatability is an issue, call
BLACS_SET with the appropriate option after defining the process grid.
ALIGNMENT REQUIREMENTS
The routine PCGESVD inherits the same alignement requirement as the
routine PCGEBRD, namely:
The distributed submatrix sub( A ) must verify some alignment proper-
ties, namely the following expressions should be true:
( MB_A.EQ.NB_A .AND. IROFFA.EQ.ICOFFA )
where NB = MB_A = NB_A,
IROFFA = MOD( IA-1, NB ), ICOFFA = MOD( JA-1, NB ),
ScaLAPACK routine 31 October 2017 PCGESVD(3)