PDTRSEN(3) ScaLAPACK routine of NEC Numeric Library Collection PDTRSEN(3)
NAME
PDTRSEN - reorders the real Schur factorization of a real matrix A =
Q*T*Q**T, so that a selected cluster of eigenvalues appears in the
leading diagonal blocks of the upper quasi-triangular matrix T, and the
leading columns of Q form an orthonormal basis of the corresponding
right invariant subspace
SYNOPSIS
SUBROUTINE PDTRSEN( JOB, COMPQ, SELECT, PARA, N, T, IT, JT, DESCT, Q,
IQ, JQ, DESCQ, WR, WI, M, S, SEP, WORK, LWORK,
IWORK, LIWORK, INFO )
CHARACTER COMPQ, JOB
INTEGER INFO, LIWORK, LWORK, M, N, IT, JT, IQ, JQ
DOUBLE PRECISION S, SEP
LOGICAL SELECT( N )
INTEGER PARA( 6 ), DESCT( * ), DESCQ( * ), IWORK( * )
DOUBLE PRECISION Q( * ), T( * ), WI( * ), WORK( * ), WR( *
)
PURPOSE
PDTRSEN reorders the real Schur factorization of a real matrix A =
Q*T*Q**T, so that a selected cluster of eigenvalues appears in the
leading diagonal blocks of the upper quasi-triangular matrix T, and the
leading columns of Q form an orthonormal basis of the corresponding
right invariant subspace. The reordering is performed by PDTRORD.
Optionally the routine computes the reciprocal condition numbers of the
cluster of eigenvalues and/or the invariant subspace. SCASY library is
needed for condition estimation.
T must be in Schur form (as returned by PDLAHQR), that is, block upper
triangular with 1-by-1 and 2-by-2 diagonal blocks.
Notes
=====
Each global data object is described by an associated description vec-
tor. This vector stores the information required to establish the map-
ping between an object element and its corresponding process and memory
location.
Let A be a generic term for any 2D block cyclicly distributed array.
Such a global array has an associated description vector DESCA. In the
following comments, the character _ should be read as "of the global
array".
NOTATION STORED IN EXPLANATION
--------------- -------------- --------------------------------------
DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case,
DTYPE_A = 1.
CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
the BLACS process grid A is distribu-
ted over. The context itself is glo-
bal, but the handle (the integer
value) may vary.
M_A (global) DESCA( M_ ) The number of rows in the global
array A.
N_A (global) DESCA( N_ ) The number of columns in the global
array A.
MB_A (global) DESCA( MB_ ) The blocking factor used to distribute
the rows of the array.
NB_A (global) DESCA( NB_ ) The blocking factor used to distribute
the columns of the array.
RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
row of the array A is distributed.
CSRC_A (global) DESCA( CSRC_ ) The process column over which the
first column of the array A is
distributed.
LLD_A (local) DESCA( LLD_ ) The leading dimension of the local
array. LLD_A >= MAX(1,LOCr(M_A)).
Let K be the number of rows or columns of a distributed matrix, and
assume that its process grid has dimension p x q.
LOCr( K ) denotes the number of elements of K that a process would
receive if K were distributed over the p processes of its process col-
umn.
Similarly, LOCc( K ) denotes the number of elements of K that a process
would receive if K were distributed over the q processes of its process
row.
The values of LOCr() and LOCc() may be determined via a call to the
ScaLAPACK tool function, NUMROC:
LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). An upper
bound for these quantities may be computed by:
LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
ARGUMENTS
JOB (global input) CHARACTER*1
Specifies whether condition numbers are required for the clus-
ter of eigenvalues (S) or the invariant subspace (SEP):
= 'N': none;
= 'E': for eigenvalues only (S);
= 'V': for invariant subspace only (SEP);
= 'B': for both eigenvalues and invariant subspace (S and SEP).
COMPQ (global input) CHARACTER*1
= 'V': update the matrix Q of Schur vectors;
= 'N': do not update Q.
SELECT (global input) LOGICAL array, dimension (N)
SELECT specifies the eigenvalues in the selected cluster. To
select a real eigenvalue w(j), SELECT(j) must be set to .TRUE..
To select a complex conjugate pair of eigenvalues w(j) and
w(j+1), corresponding to a 2-by-2 diagonal block, either
SELECT(j) or SELECT(j+1) or both must be set to .TRUE.;
a complex conjugate pair of eigenvalues must be either both
included in the cluster or both excluded.
PARA (global input) INTEGER*6
Block parameters (some should be replaced by calls to PILAENV
and others by meaningful default values):
PARA(1) = maximum number of concurrent computational windows
allowed in the algorithm;
0 < PARA(1) <= min(NPROW,NPCOL) must hold;
PARA(2) = number of eigenvalues in each window;
0 < PARA(2) < PARA(3) must hold;
PARA(3) = window size; PARA(2) < PARA(3) < DESCT(MB_)
must hold;
PARA(4) = minimal percentage of flops required for
performing matrix-matrix multiplications instead
of pipelined orthogonal transformations;
0 <= PARA(4) <= 100 must hold;
PARA(5) = width of block column slabs for row-wise
application of pipelined orthogonal
transformations in their factorized form;
0 < PARA(5) <= DESCT(MB_) must hold.
PARA(6) = the maximum number of eigenvalues moved together
over a process border; in practice, this will be
approximately half of the cross border window size
0 < PARA(6) <= PARA(2) must hold;
N (global input) INTEGER
The order of the globally distributed matrix T. N >= 0.
T (local input/output) DOUBLE PRECISION array,
dimension (LLD_T,LOCc(N)).
On entry, the local pieces of the global distributed upper
quasi-triangular matrix T, in Schur form. On exit, T is over-
written by the local pieces of the reordered matrix T, again in
Schur form, with the selected eigenvalues in the globally lead-
ing diagonal blocks.
IT (global input) INTEGER
JT (global input) INTEGER
The row and column index in the global array T indicating the
first column of sub( T ). IT = JT = 1 must hold.
DESCT (global and local input) INTEGER array of dimension DLEN_.
The array descriptor for the global distributed matrix T.
Q (local input/output) DOUBLE PRECISION array,
dimension (LLD_Q,LOCc(N)).
On entry, if COMPQ = 'V', the local pieces of the global dis-
tributed matrix Q of Schur vectors.
On exit, if COMPQ = 'V', Q has been postmultiplied by the
global orthogonal transformation matrix which reorders T; the
leading M columns of Q form an orthonormal basis for the speci-
fied invariant subspace.
If COMPQ = 'N', Q is not referenced.
IQ (global input) INTEGER
JQ (global input) INTEGER
The column index in the global array Q indicating the first
column of sub( Q ). IQ = JQ = 1 must hold.
DESCQ (global and local input) INTEGER array of dimension DLEN_.
The array descriptor for the global distributed matrix Q.
WR (global output) DOUBLE PRECISION array, dimension (N)
WI (global output) DOUBLE PRECISION array, dimension (N)
The real and imaginary parts, respectively, of the reordered
eigenvalues of T. The eigenvalues are in principle stored in
the same order as on the diagonal of T, with WR(i) = T(i,i)
and, if T(i:i+1,i:i+1) is a 2-by-2 diagonal block, WI(i) > 0
and WI(i+1) = -WI(i).
Note also that if a complex eigenvalue is sufficiently ill-con-
ditioned, then its value may differ significantly from its
value before reordering.
M (global output) INTEGER
The dimension of the specified invariant subspace. 0 <= M <=
N.
S (global output) DOUBLE PRECISION
If JOB = 'E' or 'B', S is a lower bound on the reciprocal
condition number for the selected cluster of eigenvalues.
S cannot underestimate the true reciprocal condition number by
more than a factor of sqrt(N). If M = 0 or N, S = 1.
If JOB = 'N' or 'V', S is not referenced.
SEP (global output) DOUBLE PRECISION
If JOB = 'V' or 'B', SEP is the estimated reciprocal
condition number of the specified invariant subspace. If M = 0
or N, SEP = norm(T).
If JOB = 'N' or 'E', SEP is not referenced.
WORK (local workspace/output) DOUBLE PRECISION array, dimension
(LWORK)
On exit, if INFO = 0, WORK(1) returns the optimal LWORK.
LWORK (local input) INTEGER
The dimension of the array WORK.
If LWORK = -1, then a workspace query is assumed; the routine
only calculates the optimal size of the WORK array, returns
this value as the first entry of the WORK array, and no error
message related to LWORK is issued by PXERBLA.
IWORK (local workspace/output) INTEGER array, dimension (LIWORK)
LIWORK (local input) INTEGER
The dimension of the array IWORK.
If LIWORK = -1, then a workspace query is assumed; the routine
only calculates the optimal size of the IWORK array, returns
this value as the first entry of the IWORK array, and no error
message related to LIWORK is issued by PXERBLA.
INFO (global output) INTEGER
= 0: successful exit
< 0: if INFO = -i, the i-th argument had an illegal value. If
the i-th argument is an array and the j-entry had an illegal
value, then INFO = -(i*1000+j), if the i-th argument is a
scalar and had an illegal value, then INFO = -i.
> 0: here we have several possibilites
*) Reordering of T failed because some eigenvalues are too
close to separate (the problem is very ill-conditioned);
T may have been partially reordered, and WR and WI
contain the eigenvalues in the same order as in T.
On exit, INFO = {the index of T where the swap failed}.
*) A 2-by-2 block to be reordered split into two 1-by-1
blocks and the second block failed to swap with an
adjacent block.
On exit, INFO = {the index of T where the swap failed}.
*) If INFO = N+1, there is no valid BLACS context (see the
BLACS documentation for details).
*) If INFO = N+2, the routines used in the calculation of
the condition numbers raised a positive warning flag
(see the documentation for PGESYCTD and PSYCTCON of the
SCASY library).
*) If INFO = N+3, PGESYCTD raised an input error flag;
please report this bug to the authors (see below).
If INFO = N+4, PSYCTCON raised an input error flag;
please report this bug to the authors (see below).
In a future release this subroutine may distinguish between the
case 1 and 2 above.
Method
======
This routine performs parallel eigenvalue reordering in real Schur
form. The condition number estimation part is performed by using
techniques and code from SCASY
Additional requirements
=======================
The following alignment requirements must hold:
(a) DESCT( MB_ ) = DESCT( NB_ ) = DESCQ( MB_ ) = DESCQ( NB_ )
(b) DESCT( RSRC_ ) = DESCQ( RSRC_ )
(c) DESCT( CSRC_ ) = DESCQ( CSRC_ )
All matrices must be blocked by a block factor larger than or equal to
two (3). This to simplify reordering across processor borders in the
presence of 2-by-2 blocks.
Limitations
===========
This algorithm cannot work on submatrices of T and Q, i.e.,
IT = JT = IQ = JQ = 1 must hold. This is however no limitation since
PDLAHQR does not compute Schur forms of submatrices anyway.
Parallel execution recommendations
==================================
Use a square grid, if possible, for maximum performance. The block
parameters in PARA should be kept well below the data distribution
block size.
In general, the parallel algorithm strives to perform as much work as
possible without crossing the block borders on the main block diagonal.
ScaLAPACK routine 31 October 2017 PDTRSEN(3)