What is the upshot of using inexact solves?

The upshot is that inexact solves can be used to greatly reduce the cost of each iteration, at the expense of somewhat slower convergence.

What is the Schur complement for the linear elasticity and steady-state Stokes problem?

For LBB-stable discretizations of the linear elasticity and steady-state Stokes problem, the Schur complement is spectrally equivalent to a mass matrix [485].

What is the Schur complement of the generalized Stokes problem?

For mixed finite element discretizations of the generalized Stokes problem that arises from the implicit treatment of the time-dependent Stokes problem, on the other hand, the Schur complement is of the form S = −B(A+ βI)−1BT where β >

Why is it difficult to construct good sparse approximate inverse preconditioners for saddle point?

because of the absence of decay in A−1, it is difficult to construct good sparse approximate inverse preconditioners for saddle point matrices.

What is the risk of fill-ins being discarded?

since fill-in tends to be very heavy with the original ordering of A, large numbers of fill-ins have to be discarded, often resulting in preconditioners of low quality.

What is the risk of the preconditioner becoming closer to singular?

“improving” the preconditioner (by allowing additional fill-in) may actually cause the preconditioned matrix to become very close to singular, which in turn may cause the preconditioned iteration to converge more slowly or even fail.

What is the effect of the scaling factor on the accuracy of sparse direct solvers?

Suitable tuning of this scaling factor can be interpreted as a form of preconditioning and has a dramatic impact on the accuracy attainable by sparse direct solvers [11, 144].

What is the typical starting point for the convergence analysis of the Krylov subspace methods?

The interpretation of the kth error and residual in (9.5) and (9.6) in terms of the initial error and residual multiplied by a certain polynomial in the matrix A, respectively, is the typical starting point for the convergence analysis of the Krylov subspace methods characterized by the items (C) and (M).

What are the main disadvantages of the Schur complement reduction method?

The main disadvantages are the need for A to be nonsingular, and the fact that the Schur complement S = −(BA−1BT + C) may be completely full and too expensive to compute and to factor.

What is the simplest way to factorize a symmetric indefinite matrix?

The idea was developed by Bunch and Parlett in [85], resulting in a stable algorithm for factoring symmetric indefinite matrices at a cost comparable to that of a Cholesky factorization for positive definite ones.

What is the Schur complement of the steady Stokes problem?

for very large time steps (β small) the matrix S = −B(A+βI)−1BT is close to the Schur complement of the steady Stokes problem and is well-conditioned independent of mesh size.

What is the recent work on the use of preconditioned conjugate gradients in the context?

See also [425] for closely related work in the context of constrained finite element analyses, and [33, 283, 284] for earlier work on the use of preconditioned conjugate gradients in the context of implicit null space algorithms—i.e., null space algorithms in which the matrix Z is not formed explicitly.

What is the main reason why the production of saddle point software has been lagging?

In spite of vigorous algorithmic and theoretical developments, the production of high-quality, widely accessible software for solving linear systems in saddle point form has been somewhat lagging.

(Open Access) Numerical solution of saddle point problems (2005) | Michele Benzi

Technical Report

TR-2004-028

Numerical Solution of Saddle Point Problems

Michele Benzi, Gene H. Golub, Jorg Liesen

Mathematics and Computer Science

EMORY UNIVERSITY

NUMERICAL SOLUTION OF SADDLE POINT PROBLEMS

∗

MICHELE BENZI

†

, GENE H. GOLUB

‡

, AND J

ORG LIESEN

We dedicate this paper to Gil Strang on the occasion of his 70th birthday

Abstract. Large linear systems of saddle point type arise in a wide variety of applications throughout compu-

tational science and engineering. Due to their indeﬁniteness and often poor spectral properties, such linear systems

represent a signiﬁcant challenge for solver developers. In recent years there has been a surge of interest in saddle

point problems, and numerous solution techniques have been proposed for solving this type of systems. The aim of

this paper is to present and discuss a large selection of solution methods for linear systems in saddle point form, with

an emphasis on iterative methods for large and sparse problems.

CONTENTS

1 Introduction 1

2 Applications leading to saddle p oint problems 4

3 Properties of saddle point matrices 9

4 Overview of solution algorithms 20

5 Schur complement reduction 21

6 Null space methods 22

7 Coupled direct solvers 28

8 Stationary iterations 30

9 Krylov subspace methods 34

10 Preconditioners 41

11 Multilevel methods 66

12 Available software 72

13 Concluding remarks 73

References 74

1. Introduction. In recent years, a large amount of work has been devoted to the problem

of solving large linear systems in saddle point form. The reason for th is interest is due to the fact

that such problems arise in a wide variety of technical and scientiﬁc applications. For example, the

ever increasing popularity of mixed ﬁnite element methods in engineering ﬁelds such as ﬂuid and

solid mechanics has been a major source of saddle point systems [79, 170]. Another reason for this

surge in interest is due to the extraordinary success of interior point algorithms in both linear and

nonlinear op timization, which require at their heart the solution of a sequence of systems in saddle

point form [371, 506, 507].

Because of the ubiquitous nature of saddle point systems, methods and results on their numerical

solution have appeared in a wide variety of books, journals and conference proceedings, justifying

∗

This draft dated 13 December 2004. To appear in Acta Numerica 2005.

†

Department of Mathematics and Computer Science, Emory University, Atlanta, Georgia 30322, USA

(benzi@mathcs.emory.edu). The work of this author was supported in part by the National Science Foundation

grant DMS-0207599.

‡

Scientiﬁc Computing and Computational Mathematics Program, Stanford University, Stanford, California 94305-

9025, USA (golub@sccm.stanford.edu).

Institut f¨ur Mathematik, Technische Universit¨at Berlin, D-10623 Berlin, Germany (liesen@math.tu-berlin.de).

2 M. Benzi, G. H. Golub and J. Liesen

the need for a comprehensive survey of the subject. The purpose of this article is to review many of

the most promising solution methods, with an emphasis on iterative methods for large and sparse

problems. Although many of these solvers have been developed with speciﬁc applications in mind

(for example, Stokes-types problems in ﬂuid dynamics), it is possible to d iscuss them in a fairly

general setting using standard numerical linear algebra concepts, the most prominent being perhaps

the Schur complement. Nevertheless, when choosing a preconditioner (or developing a new one),

knowledge of the origin of the particular problem at hand is essential. We therefore devote some

space to a discussion of saddle point problems arising in a few selected applications.

It is hoped that the present survey will prove useful to practitioners who are looking for guidance

in the choice of a solution method for their own application, to researchers in numerical linear algebra

and scientiﬁc computing, and especially to graduate students as an introduction to this very rich

and important subject.

1.1. Problem statement and class iﬁcation. The subject of this paper is the solution of

block 2 × 2 linear systems of the form



A B

−C









, or Au = b ,(1.1)

A ∈ R

n×n

, B

∈ R

m×n

, C ∈ R

m×m

with n ≥ m .(1.2)

It is obvious that, under suitable partitioning, any linear system can be cast in the form (1.1)–(1.2).

We explicitly exclude the case where A or one or both of B

, B

are zero. When the linear system

describes a (generalized) saddle point problem, the constitu ent blocks A, B

, B

and C satisfy one

or more of the following conditions:

C1 A is symmetric: A = A

C2 The symmetric part of A, H ≡

(A + A

), is positive semideﬁnite

C3 B

= B

C4 C is symmetric (C = C

) and positive semideﬁnite

C5 C = O (the zero matrix)

Note that C5 imp lies C4. The most basic case is obtained when all the above conditions are

satisﬁed. In this case A is symmetric positive semideﬁnite and we have a symmetric linear system

of the form



A B

B O









.(1.3)

This system arises as the ﬁrst order optimality conditions for the following equality-constrained

quadratic programming problem:

min J(x) =

Ax −f

x(1.4)

subject to Bx = g .(1.5)

In this case the variable y represents the vector of Lagrange multipliers. Any solution (x

∗

, y

∗

) of

(1.3) is a saddle p oint for the Lagrangian

L(x, y) =

Ax −f

x + (Bx − g)

hence the name “saddle point problem” given to (1.3). Recall that a saddle point is a point (x

∗

, y

∗

) ∈

n+m

that satisﬁes

L(x

∗

, y) ≤ L(x

∗

, y

∗

) ≤ L(x, y

∗

) ∀x ∈ R

, ∀y ∈ R

Solution of Saddle Point Problems 3

or, equivalently,

min

max

L(x, y) = L(x

∗

, y

∗

) = max

min

L(x, y) .

Systems of the form (1.3) also arise in nonlinearly constrained optimization (sequential quadratic

programming and interior point methods), in ﬂuid dynamics (Stokes problem), incompressible elas-

ticity, circuit analysis, structural analysis, and so forth; see the next section for a discussion of

applications leading to saddle p oint problems.

Another important special case is when conditions C1–C4 are satisﬁed, but not C5. In this case

we have a block linear system of the form:



A B

B −C









.(1.6)

Problems of this kind frequently arise in the context of stabilized mixed ﬁnite element method s.

Stabilization is used whenever the discrete variables x and y belong to ﬁnite element spaces that

do not satisfy the Ladyzhenskaya–Babuˇska–Brezzi (or inf-sup) condition [79]. Another situation

leading to a nonzero C is the discretization of the equations describing slightly compressible ﬂuids

or solids [69, Chapter 6.3]. Systems of the form (1.6) also arise from regularized, weighted least-

squares problems [49] and from certain interior point methods i n optimization [506, 507]. Often the

matrix C has small norm compared to the other blocks.

In the literature, the phrase generalized saddle point problem has been used primarily to allow

for the possibility of a nonsymmetric coeﬃcient matrix A in (1.1). In such problems either A 6= A

(with condition C2 usually satisﬁed), or B

6= B

, or both. The most important example is perhaps

that of the linearized Navier–Stokes equations, where linearization has been obtained by Picard

iteration or by some variant of Newton’s method. See [111, 370] and [456] for additional examples.

We n ote that our deﬁnition of generalized sadd le point p rob lem as a linear system of the form (1.1)–

(1.2) where the blocks A, B

, B

and C satisfy one or more of the conditions C1-C5 is the most

general possible, and it contains previous deﬁnitions as special cases.

In the vast majority of cases, linear systems of saddle point type have real coeﬃcients, and in

this paper we restrict ourselves to the real case. Complex coeﬃcient matrices, however, do arise in

some cases; see, e.g., [61, 345] and [449, page 117]. Most of the results and algorithms reviewed in

this paper admit straightforward extensions to the complex case.

1.2. Sparsity, structure and size. Although saddle point systems come in all sizes and

with widely diﬀerent structural and sparsity properties, in this paper we are mainly interested in

problems that are both large and sparse. This justiﬁes our emphasis on iterative solvers. Direct

solvers, however, are still the preferred method in optimization and other areas. Furthermore, direct

methods are often used in the solution of subproblems, for example as part of a preconditioner solve.

Some of the algorithms considered in this paper are also applicable if one or more of the blocks in A

happen to be dense, as long as matrix-vector products with A can be performed eﬃciently, typically

in O(n + m) time. This means that if a dense block is present, it must have a special structure (e.g.,

Toeplitz, as in [49, 285]) or it must be possible to approximate its action on a vector with (nearly)

linear complexity, as in the fast multipole method [345].

Frequently, the matrices that arise in practice have quite a bit of structure. For instance, the

A block is often block diagonal, with each diagonal block endowed with additional structure. Many

of the algor ithms discussed in this paper are able to exploit the structure of the problem to gain

eﬃciency and save on storage. Sometimes the structure of the problem su ggests solution algorithms

4 M. Benzi, G. H. Golub and J. Liesen

that have a high degree of parallelism. This last aspect, however, is not emphasized in this paper.

Finally we mention that in most applications n is larger than m, often much larger.

2. Applications leading to saddle point problems. As already mentioned, large-scale

saddle point problems occur in many areas of computational science and engineering. The following

is a list of some ﬁelds where saddle point problems naturally arise, together with some references:

• Computational ﬂuid dynamics [213, 407, 459, 469, 499]

• Constrained and weighted least squares estimation [59, 222]

• Constrained optimization [210, 506, 507]

• Economics [18, 143, 320, 456]

• Electrical circuits and networks [51, 109, 449, 467]

• Electromagnetism [67, 390, 392]

• Finance [348, 349]

• Image reconstruction [255]

• Image registration [248, 362]

• Interpolation of scattered data [342, 435]

• Linear elasticity [69, 110]

• Mesh generation for computer graphics [324]

• Mixed ﬁnite element approximations of elliptic PDEs [78, 79, 407]

• Model order reduction for dynamical systems [194, 263, 453]

• Optimal control [36, 37, 56, 57, 369]

• Parameter identiﬁcation problems [86, 246, 247]

Quite often, saddle point systems arise when a certain quantity (such as the energy of a physical

system) has to be minimized, subject to a set of linear constraints. In this case the Lagrange

multiplier y usually has a physical interpretation and its computation is also of interest. For example,

in incompressible ﬂow problems x is a vector of velocities and y a vector of pressures. In structural

mechanics x is the vector of internal forces, y represents the nodal displacements of the structure.

For resistive electrical networks y represents the nodal potentials, x being the vector of currents.

In some cases, such as ﬂuid dynamics or linear elasticity, saddle point problems result from the

discretization of systems of partial diﬀerential equations with constraints. Typically the constraints

represent some basic conservation law, such as mass conservation in ﬂuid dynamics. In other cases,

such as resistive electrical networks or structural analysis, the equations are discrete to begin with.

Now the constraints may correspond to the topology (connectivity) of the system being studied. Be-

cause saddle point equations can be derived as equilibrium conditions for a physical system, they are

sometimes called equilibrium equations. See [449] for a very nice discussion of equilibrium equations

throughout applied mathematics. Anoth er popular name for saddle point systems, especially in the

optimization literature, is “KKT system,” from the Karush-Kuh n-Tucker constraint qualiﬁcation

conditions; see [371, page 328] for precise deﬁnitions, and [219, 300] for historical notes.

Systems of the form (1.1)–(1.2) also arise from non-overlapping domain decomposition when

interface unknowns are numbered last, as well as from FETI-type schemes when Lagrange multipliers

are used to ensure continuity at the interfaces; see for instance [95, 175, 275, 408].

It is of course not possible for us to cover here all these diﬀerent applications. We choose instead

to give some details about three classes of problems leading to saddle point systems. The ﬁrst comes

from the ﬁeld of computational ﬂuid dynamics, the second from least squares estimation, and the

third one from interior point methods in constrained optimization.

Numerical solution of saddle point problems

Figures

Citations

Optimization Algorithms on Matrix Manifolds

Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book

On the Navier-Stokes equations

qpOASES: a parametric active-set algorithm for quadratic programming

Split Bregman Methods and Frame Based Image Restoration

References

Matrix computations

Matrix Analysis

Numerical heat transfer and fluid flow

Numerical Optimization

Iterative Methods for Sparse Linear Systems

Related Papers (5)

Iterative Methods for Sparse Linear Systems

GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems

Finite Elements and Fast Iterative Solvers: with Applications in Incompressible Fluid Dynamics

Hermitian and Skew-Hermitian Splitting Methods for Non-Hermitian Positive Definite Linear Systems

A Note on Preconditioning for Indefinite Linear Systems

Frequently Asked Questions (13)

Q1. What is the upshot of using inexact solves?

Q2. What is the Schur complement for the linear elasticity and steady-state Stokes problem?

Q3. What is the Schur complement of the generalized Stokes problem?

Q4. Why is it difficult to construct good sparse approximate inverse preconditioners for saddle point?

Q5. What is the risk of fill-ins being discarded?

Q6. What is the risk of the preconditioner becoming closer to singular?

Q7. What is the effect of the scaling factor on the accuracy of sparse direct solvers?

Q8. What is the typical starting point for the convergence analysis of the Krylov subspace methods?

Q9. What are the main disadvantages of the Schur complement reduction method?

Q10. What is the simplest way to factorize a symmetric indefinite matrix?

Q11. What is the Schur complement of the steady Stokes problem?

Q12. What is the recent work on the use of preconditioned conjugate gradients in the context?

Q13. What is the main reason why the production of saddle point software has been lagging?