What are the contributions mentioned in the paper "Inverse problems: a bayesian perspective" ?

They are equivalent if and only if the following three conditions hold: ( i ) Im ( C 1 ) = Im ( C 1/2 2 ): = E, ( ii ) m1 −m2 ∈ E, ( iii ) the operator T: = ( C−1/2 1 C 1/2 2 ) ( C−1/2 1 C 1/2 2 ) ∗ − I is Hilbert–Schmidt in E.

Why is the importance of this class of algorithms natural?

The importance of this class of algorithmsstems from the fact that, in many applications, solutions are required online, with updates required as more data is acquired; thus sequential updating of the posterior measure at the current time is natural.

What is the common method for interrogating a probability measure in high dimensions?

Another commonly used method for interrogating a probability measure in high dimensions is sampling : generating a set of points {un}Nn=1 distributed (perhaps only approximately) according to πy(u).

What is the key ingredient in the definition of well-posed posterior measures?

This will enable us to measuredistance between pairs of probability measures, and is a key ingredient in the definition of well-posed posterior measures described in this article.

What is the role of decay of the covariance operator in determining the regularity properties?

In particular, the rate of decay of the eigenvalues of the covariance operator plays a central role in determining the regularity properties.

What are the powerful tools for sampling?

Among the most powerful generic tools for sampling are the Markov chain Monte Carlo (MCMC) methods, which the authors review in Section 5.2.

What is the case where (i) is satisfied trivially?

That paper contains Theorems 4.1 and 4.2 under Assumptions 2.6 in the case where (i) is satisfied trivially because Φ is bounded from below by a constant; note that this case occurs whenever the data is finite-dimensional.

What is the formula for the mean derived by completing the square?

The authors haveΓ−1 − (Γ +AC0A∗)−1AC0A∗Γ−1 = (Γ +AC0A∗)−1. (6.16)The formula for the mean derived by completing the square gives m = C ( (C−1 −A∗Γ−1A)m0 +A∗Γ−1y ) = m0 + CA∗Γ−1(y −Am0).

What is the third assumption important for showing that the posterior probability measure is well-defined?

The third assumption is important for showing that the posterior probability measure is well-defined, whilst the fourth is important for showing continuity with respect to data.

How is the probability of a small ball of radius maximized?

Thus the Lebesgue density of µ is maximized by minimizing The authorover Rn. Another way of looking at this is as follows: if u is such a minimizer then the probability of a small ball of radius ε and centred at u will be maximized, asymptotically as ε→ 0, by choosing u = u.

What is the generalization of the theorems to allow for (i)?

Generalizing the theorems to allow for (i) as stated here was undertaken in Hairer, Stuart and Voss (2010b), in the context of signal processing for stochastic differential equations.

(Open Access) Inverse problems: A Bayesian perspective (2010) | Andrew M. Stuart

Acta Numerica

http://journals.cambridge.org/ANU

Additional services for Acta Numerica:

Email alerts: Click here

Subscriptions: Click here

Commercial reprints: Click here

Inverse problems: A Bayesian perspective

A. M. Stuart

Acta Numerica / Volume 19 / May 2010, pp 451 - 559

DOI: 10.1017/S0962492910000061, Published online: 10 May 2010

Link to this article: http://journals.cambridge.org/abstract_S0962492910000061

How to cite this article:

A. M. Stuart (2010). Inverse problems: A Bayesian perspective. Acta Numerica, 19, pp 451-559 doi:10.1017/

S0962492910000061

Request Permissions : Click here

Downloaded from http://journals.cambridge.org/ANU, IP address: 137.205.50.42 on 11 Sep 2013

Acta Numerica (2010), pp. 451–559

 Cambridge University Press, 2010

doi:10.1017/S0962492910000061 Printed in the United Kingdom

Inverse problems: A Bayesian perspective

A. M. Stuart

Mathematics Institute,

University of Warwick,

Coventry CV4 7AL, UK

E-mail: a.m.stuart@warwick.ac.uk

The subject of inverse problems in diﬀerential equations is of enormous practi-

cal importance, and has also generated substantial mathematical and compu-

tational innovation. Typically some form of regularization is required to ame-

liorate ill-posed behaviour. In this article we review the Bayesian approach

to regularization, developing a function space viewpoint on the subject. This

approach allows for a full characterization of all possible solutions, and their

relative probabilities, whilst simultaneously forcing signiﬁcant modelling is-

sues to be addressed in a clear and precise fashion. Although expensive to

implement, this approach is starting to lie within the range of the available

computational resources in many application areas. It also allows for the

quantiﬁcation of uncertainty and risk, something which is increasingly de-

manded by these applications. Furthermore, the approach is conceptually

important for the understanding of simpler, computationally expedient ap-

proaches to inverse problems.

We demonstrate that, when formulated in a Bayesian fashion, a wide range

of inverse problems share a common mathematical framework, and we high-

light a theory of well-posedness which stems from this. The well-posedness

theory provides the basis for a number of stability and approximation results

which we describe. We also review a range of algorithmic approaches which

are used when adopting the Bayesian approach to inverse problems. These

include MCMC methods, ﬁltering and the variational approach.

CONTENTS

1 Introduction 452

2 The Bayesian framework 456

3 Examples 476

4 Common structure 499

5 Algorithms 508

6 Probability 524

References 548

452 A. M. Stuart

1. Introduction

A signiﬁcant challenge facing mathematical scientists is the development of

a coherent mathematical and algorithmic framework enabling researchers to

blend complex mathematical models with the (often vast) data sets which

are now routinely available in many ﬁelds of engineering, science and tech-

nology. In this article we frame a range of inverse problems, mostly arising

from the conjunction of diﬀerential equations and data, in the language of

Bayesian statistics. In so doing our aim is twofold: (i) to highlight common

mathematical structure arising from the numerous application areas where

signiﬁcant progress has been made by practitioners over the last few decades

and thereby facilitate exchange of ideas between diﬀerent application do-

mains; (ii) to develop an abstract function space setting for the problems

in order to evaluate the eﬃciency of existing algorithms, and to develop

new algorithms. Applications are far-reaching and include ﬁelds such as the

atmospheric sciences, oceanography, hydrology, geophysics, chemistry and

biochemistry, materials science, systems biology, traﬃc ﬂow, econometrics,

image processing and signal processing.

The guiding principle underpinning the speciﬁc development of the sub-

ject of Bayesian inverse problems in this article is to avoid discretization

until the last possible moment. This principle is enormously empowering

throughout numerical analysis. For example, the ﬁrst-order wave equation

is not controllable to a given ﬁnal state in arbitrarily small time because of

ﬁnite speed of propagation. Yet every ﬁnite diﬀerence spatial discretization

of the ﬁrst-order wave equation gives rise to a linear system of ordinary

diﬀerential equations which is controllable, in any ﬁnite time, to a given

ﬁnal state; asking the controllability question before discretization is key to

understanding (Zuazua 2005). As another example consider the heat equa-

tion. If this is discretized in time by the theta method (with θ ∈ [0, 1] and

θ = 0 being explicit Euler, θ = 1 implicit Euler), but left undiscretized in

space, the resulting algorithm on function space is only deﬁned if θ ∈ [

, 1];

thus it is possible to deduce that there must be a Courant restriction if

θ ∈ [0,

) (Richtmyer and Morton 1967) before even introducing spatial

discretization. Yet another example may be found in the study of Newton

methods: conceptual application of this algorithm on function space, before

discretization, can yield considerable insight when applying it as an itera-

tive method for boundary value problems in nonlinear diﬀerential equations

(Deuﬂhard 2004). The list of problems where it is beneﬁcial to defer dis-

cretization to the very end of the algorithmic formulation is almost endless.

It is perhaps not surprising, therefore, that the same idea yields insight

in the solution of inverse problems and we substantiate this idea in the

Bayesian context.

Inverse problems 453

The article is divided into ﬁve parts. The next section, Section 2, is de-

voted to a description of the basic ideas of Bayesian statistics as applied to

inverse problems in the ﬁnite-dimensional setting. It also includes a pointer

to the common structure that we will highlight in the remainder of the article

when developing the Bayesian viewpoint in function space. Section 3 con-

tains a range of inverse problems arising in diﬀerential equations, showing

how the Bayesian approach may be applied to inverse problems for func-

tions; in particular, we discuss the problem of recovering a ﬁeld from noisy

pointwise data, recovering the diﬀusion coeﬃcient from a boundary value

problem, given noisy pointwise observations of the solution, recovering the

wave speed from noisy observations of solutions of the wave equation and

recovering the initial condition of the heat equation from noisy observation

of the solution at a positive time. We also describe a range of applications,

involving similar but more complex models, arising in weather forecasting,

oceanography, subsurface geophysics and molecular dynamics. In Section 4

we describe, and exploit, the common mathematical structure which un-

derlies all of these Bayesian inverse problems for functions. In that section

we prove a form of well-posedness for these inverse problems, by showing

Lipschitz continuity of the posterior measure with respect to changes in

the data; we also prove an approximation theorem which exploits this well-

posedness to show that approximation of the forward problem (by spectral

or ﬁnite element methods, for example) leads to similar approximation re-

sults for the posterior probability measure. Section 5 is devoted to a survey

of the existing algorithmic tools used to solve the problems highlighted in

the article. In particular, Markov chain Monte Carlo (MCMC) methods,

variational methods and ﬁltering methods are surveyed. When discussing

variational methods we show, in the setting of Section 4, that posterior

probability maximizers can be characterized through solution of an optimal

control problem, and that this optimal control problem has a minimizer

under the same conditions that lead to a well-posed Bayesian inverse prob-

lem. Section 6 contains the background probability required to read the

article; the presentation in this section is necessarily terse and the reader is

encouraged to follow up references in the bibliography for further detail.

A major theme of the article is thus to confront the inﬁnite-dimensional

nature of many inverse problems. This is important because, whilst all

computational algorithms work on ﬁnite-dimensional approximations, these

approximations are typically in spaces of very high dimension and many

signiﬁcant challenges stem from this fact. By formulating inverse problems

in an inﬁnite-dimensional setting we build these challenges into the fabric

of the problem setting. We provide a clear concept of the ideal solution

to the inverse problem when blending a forward mathematical model with

observational data. This concept can be used to test the practical algorithms

used in applications which, in many cases, use crude approximations for

454 A. M. Stuart

reasons of computational eﬃciency. Furthermore, it is also possible that

the function space Bayesian setting will also lead to the development of

improved algorithms which exploit the underlying mathematical structure

common to a diverse range of applications. In particular, the theory of

(Bayesian) well-posedness which we describe forms the cornerstone of many

perturbation theories, including ﬁnite-dimensional approximations.

Kaipio and Somersalo (2005) provide a good introduction to the Bayesian

approach to inverse problems, especially in the context of diﬀerential equa-

tions. Furthermore, Calvetti and Somersalo (2007b) provide a useful in-

troduction to the Bayesian perspective in scientiﬁc computing. Another

overview of the subject of inverse problems in diﬀerential equations, in-

cluding a strong argument for the philosophy taken in this article, namely

to formulate and study inverse problems in function space, is the book by

Tarantola (2005) (see, especially, Chapter 5); however, the mathematics as-

sociated with this philosophical standpoint is not developed there to the

same extent that it is in this article, and the focus is primarily on Gaussian

problems. A frequentist viewpoint for inverse problems on function space is

contained in the book by Ramsay and Silverman (2005); however, we adopt

a diﬀerent, Bayesian, perspective here, and study more involved diﬀerential

equation models than those arising in Ramsay and Silverman (2005). These

books indicate that the development that we undertake here is a natural

one, which builds upon the existing literature.

The subject known as data assimilation provides a number of impor-

tant applications of the material presented here. Its development has been

driven, to a large extent, by practitioners working in the atmospheric and

oceanographic sciences and in the geosciences, resulting in a plethora of al-

gorithmic approaches and a number of signiﬁcant algorithmic innovations.

A good source for an understanding of data assimilation in the context of the

atmospheric sciences, and weather prediction in particular, is the book by

Kalnay (2003). A book motivated by applications in oceanography, which

simultaneously highlights some of the underlying function space structure of

data assimilation for linear, Gaussian problems, is that of Bennett (2002).

The book by Evensen (2006) provides a good overview of many computa-

tional aspects of the subject, reﬂecting the author’s experience in geophys-

ical applications and related areas. The recent special edition of Physica D

devoted to data assimilation provides a good entry point to some of the cur-

rent research in this area (Ide and Jones 2007). Another application that

ﬁts the mathematical framework developed here is molecular dynamics. The

problems of interest do not arise from Bayesian inverse problems, as such,

but rather from conditioned diﬀusion processes. However, the mathemat-

ical structure has much in common with that arising in Bayesian inverse

problems, and so we include a description of this problem area.

Inverse problems: A Bayesian perspective

Citations

Machine learning

Probability and Random Processes

Physics-informed machine learning

Hidden physics models: Machine learning of nonlinear partial differential equations

Survey of Multifidelity Methods in Uncertainty Propagation, Inference, and Optimization

References

Equation of state calculations by fast computing machines

Compressed sensing

Nonlinear total variation based noise removal algorithms

Monte Carlo Sampling Methods Using Markov Chains and Their Applications

Machine learning

Related Papers (5)

Statistical and computational inverse problems

MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster

Inverse Problem Theory and Methods for Model Parameter Estimation

Monte Carlo Sampling Methods Using Markov Chains and Their Applications

Equation of state calculations by fast computing machines

Frequently Asked Questions (11)

Q1. What are the contributions mentioned in the paper "Inverse problems: a bayesian perspective" ?

Q2. Why is the importance of this class of algorithms natural?

Q3. What is the common method for interrogating a probability measure in high dimensions?

Q4. What is the key ingredient in the definition of well-posed posterior measures?

Q5. What is the role of decay of the covariance operator in determining the regularity properties?

Q6. What are the powerful tools for sampling?

Q7. What is the case where (i) is satisfied trivially?

Q8. What is the formula for the mean derived by completing the square?

Q9. What is the third assumption important for showing that the posterior probability measure is well-defined?

Q10. How is the probability of a small ball of radius maximized?

Q11. What is the generalization of the theorems to allow for (i)?