scispace - formally typeset
Open AccessJournal ArticleDOI

Inverse problems: A Bayesian perspective

Andrew M. Stuart
- 01 May 2010 - 
- Vol. 19, pp 451-559
TLDR
The Bayesian approach to regularization is reviewed, developing a function space viewpoint on the subject, which allows for a full characterization of all possible solutions, and their relative probabilities, whilst simultaneously forcing significant modelling issues to be addressed in a clear and precise fashion.
Abstract
The subject of inverse problems in differential equations is of enormous practical importance, and has also generated substantial mathematical and computational innovation. Typically some form of regularization is required to ameliorate ill-posed behaviour. In this article we review the Bayesian approach to regularization, developing a function space viewpoint on the subject. This approach allows for a full characterization of all possible solutions, and their relative probabilities, whilst simultaneously forcing significant modelling issues to be addressed in a clear and precise fashion. Although expensive to implement, this approach is starting to lie within the range of the available computational resources in many application areas. It also allows for the quantification of uncertainty and risk, something which is increasingly demanded by these applications. Furthermore, the approach is conceptually important for the understanding of simpler, computationally expedient approaches to inverse problems.

read more

Content maybe subject to copyright    Report

Acta Numerica
http://journals.cambridge.org/ANU
Additional services for Acta Numerica:
Email alerts: Click here
Subscriptions: Click here
Commercial reprints: Click here
Terms of use : Click here
Inverse problems: A Bayesian perspective
A. M. Stuart
Acta Numerica / Volume 19 / May 2010, pp 451 - 559
DOI: 10.1017/S0962492910000061, Published online: 10 May 2010
Link to this article: http://journals.cambridge.org/abstract_S0962492910000061
How to cite this article:
A. M. Stuart (2010). Inverse problems: A Bayesian perspective. Acta Numerica, 19, pp 451-559 doi:10.1017/
S0962492910000061
Request Permissions : Click here
Downloaded from http://journals.cambridge.org/ANU, IP address: 137.205.50.42 on 11 Sep 2013

Acta Numerica (2010), pp. 451–559
c
Cambridge University Press, 2010
doi:10.1017/S0962492910000061 Printed in the United Kingdom
Inverse problems: A Bayesian perspective
A. M. Stuart
Mathematics Institute,
University of Warwick,
Coventry CV4 7AL, UK
E-mail: a.m.stuart@warwick.ac.uk
The subject of inverse problems in differential equations is of enormous practi-
cal importance, and has also generated substantial mathematical and compu-
tational innovation. Typically some form of regularization is required to ame-
liorate ill-posed behaviour. In this article we review the Bayesian approach
to regularization, developing a function space viewpoint on the subject. This
approach allows for a full characterization of all possible solutions, and their
relative probabilities, whilst simultaneously forcing significant modelling is-
sues to be addressed in a clear and precise fashion. Although expensive to
implement, this approach is starting to lie within the range of the available
computational resources in many application areas. It also allows for the
quantification of uncertainty and risk, something which is increasingly de-
manded by these applications. Furthermore, the approach is conceptually
important for the understanding of simpler, computationally expedient ap-
proaches to inverse problems.
We demonstrate that, when formulated in a Bayesian fashion, a wide range
of inverse problems share a common mathematical framework, and we high-
light a theory of well-posedness which stems from this. The well-posedness
theory provides the basis for a number of stability and approximation results
which we describe. We also review a range of algorithmic approaches which
are used when adopting the Bayesian approach to inverse problems. These
include MCMC methods, filtering and the variational approach.
CONTENTS
1 Introduction 452
2 The Bayesian framework 456
3 Examples 476
4 Common structure 499
5 Algorithms 508
6 Probability 524
References 548

452 A. M. Stuart
1. Introduction
A significant challenge facing mathematical scientists is the development of
a coherent mathematical and algorithmic framework enabling researchers to
blend complex mathematical models with the (often vast) data sets which
are now routinely available in many fields of engineering, science and tech-
nology. In this article we frame a range of inverse problems, mostly arising
from the conjunction of differential equations and data, in the language of
Bayesian statistics. In so doing our aim is twofold: (i) to highlight common
mathematical structure arising from the numerous application areas where
significant progress has been made by practitioners over the last few decades
and thereby facilitate exchange of ideas between different application do-
mains; (ii) to develop an abstract function space setting for the problems
in order to evaluate the efficiency of existing algorithms, and to develop
new algorithms. Applications are far-reaching and include fields such as the
atmospheric sciences, oceanography, hydrology, geophysics, chemistry and
biochemistry, materials science, systems biology, traffic flow, econometrics,
image processing and signal processing.
The guiding principle underpinning the specific development of the sub-
ject of Bayesian inverse problems in this article is to avoid discretization
until the last possible moment. This principle is enormously empowering
throughout numerical analysis. For example, the first-order wave equation
is not controllable to a given final state in arbitrarily small time because of
finite speed of propagation. Yet every finite difference spatial discretization
of the first-order wave equation gives rise to a linear system of ordinary
differential equations which is controllable, in any finite time, to a given
final state; asking the controllability question before discretization is key to
understanding (Zuazua 2005). As another example consider the heat equa-
tion. If this is discretized in time by the theta method (with θ [0, 1] and
θ = 0 being explicit Euler, θ = 1 implicit Euler), but left undiscretized in
space, the resulting algorithm on function space is only defined if θ [
1
2
, 1];
thus it is possible to deduce that there must be a Courant restriction if
θ [0,
1
2
) (Richtmyer and Morton 1967) before even introducing spatial
discretization. Yet another example may be found in the study of Newton
methods: conceptual application of this algorithm on function space, before
discretization, can yield considerable insight when applying it as an itera-
tive method for boundary value problems in nonlinear differential equations
(Deuflhard 2004). The list of problems where it is beneficial to defer dis-
cretization to the very end of the algorithmic formulation is almost endless.
It is perhaps not surprising, therefore, that the same idea yields insight
in the solution of inverse problems and we substantiate this idea in the
Bayesian context.

Inverse problems 453
The article is divided into five parts. The next section, Section 2, is de-
voted to a description of the basic ideas of Bayesian statistics as applied to
inverse problems in the finite-dimensional setting. It also includes a pointer
to the common structure that we will highlight in the remainder of the article
when developing the Bayesian viewpoint in function space. Section 3 con-
tains a range of inverse problems arising in differential equations, showing
how the Bayesian approach may be applied to inverse problems for func-
tions; in particular, we discuss the problem of recovering a field from noisy
pointwise data, recovering the diffusion coefficient from a boundary value
problem, given noisy pointwise observations of the solution, recovering the
wave speed from noisy observations of solutions of the wave equation and
recovering the initial condition of the heat equation from noisy observation
of the solution at a positive time. We also describe a range of applications,
involving similar but more complex models, arising in weather forecasting,
oceanography, subsurface geophysics and molecular dynamics. In Section 4
we describe, and exploit, the common mathematical structure which un-
derlies all of these Bayesian inverse problems for functions. In that section
we prove a form of well-posedness for these inverse problems, by showing
Lipschitz continuity of the posterior measure with respect to changes in
the data; we also prove an approximation theorem which exploits this well-
posedness to show that approximation of the forward problem (by spectral
or finite element methods, for example) leads to similar approximation re-
sults for the posterior probability measure. Section 5 is devoted to a survey
of the existing algorithmic tools used to solve the problems highlighted in
the article. In particular, Markov chain Monte Carlo (MCMC) methods,
variational methods and filtering methods are surveyed. When discussing
variational methods we show, in the setting of Section 4, that posterior
probability maximizers can be characterized through solution of an optimal
control problem, and that this optimal control problem has a minimizer
under the same conditions that lead to a well-posed Bayesian inverse prob-
lem. Section 6 contains the background probability required to read the
article; the presentation in this section is necessarily terse and the reader is
encouraged to follow up references in the bibliography for further detail.
A major theme of the article is thus to confront the infinite-dimensional
nature of many inverse problems. This is important because, whilst all
computational algorithms work on finite-dimensional approximations, these
approximations are typically in spaces of very high dimension and many
significant challenges stem from this fact. By formulating inverse problems
in an infinite-dimensional setting we build these challenges into the fabric
of the problem setting. We provide a clear concept of the ideal solution
to the inverse problem when blending a forward mathematical model with
observational data. This concept can be used to test the practical algorithms
used in applications which, in many cases, use crude approximations for

454 A. M. Stuart
reasons of computational efficiency. Furthermore, it is also possible that
the function space Bayesian setting will also lead to the development of
improved algorithms which exploit the underlying mathematical structure
common to a diverse range of applications. In particular, the theory of
(Bayesian) well-posedness which we describe forms the cornerstone of many
perturbation theories, including finite-dimensional approximations.
Kaipio and Somersalo (2005) provide a good introduction to the Bayesian
approach to inverse problems, especially in the context of differential equa-
tions. Furthermore, Calvetti and Somersalo (2007b) provide a useful in-
troduction to the Bayesian perspective in scientific computing. Another
overview of the subject of inverse problems in differential equations, in-
cluding a strong argument for the philosophy taken in this article, namely
to formulate and study inverse problems in function space, is the book by
Tarantola (2005) (see, especially, Chapter 5); however, the mathematics as-
sociated with this philosophical standpoint is not developed there to the
same extent that it is in this article, and the focus is primarily on Gaussian
problems. A frequentist viewpoint for inverse problems on function space is
contained in the book by Ramsay and Silverman (2005); however, we adopt
a different, Bayesian, perspective here, and study more involved differential
equation models than those arising in Ramsay and Silverman (2005). These
books indicate that the development that we undertake here is a natural
one, which builds upon the existing literature.
The subject known as data assimilation provides a number of impor-
tant applications of the material presented here. Its development has been
driven, to a large extent, by practitioners working in the atmospheric and
oceanographic sciences and in the geosciences, resulting in a plethora of al-
gorithmic approaches and a number of significant algorithmic innovations.
A good source for an understanding of data assimilation in the context of the
atmospheric sciences, and weather prediction in particular, is the book by
Kalnay (2003). A book motivated by applications in oceanography, which
simultaneously highlights some of the underlying function space structure of
data assimilation for linear, Gaussian problems, is that of Bennett (2002).
The book by Evensen (2006) provides a good overview of many computa-
tional aspects of the subject, reflecting the author’s experience in geophys-
ical applications and related areas. The recent special edition of Physica D
devoted to data assimilation provides a good entry point to some of the cur-
rent research in this area (Ide and Jones 2007). Another application that
fits the mathematical framework developed here is molecular dynamics. The
problems of interest do not arise from Bayesian inverse problems, as such,
but rather from conditioned diffusion processes. However, the mathemat-
ical structure has much in common with that arising in Bayesian inverse
problems, and so we include a description of this problem area.

Citations
More filters
Journal ArticleDOI

Machine learning

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Journal ArticleDOI

Probability and Random Processes

Ali Esmaili
- 01 Aug 2005 - 
TL;DR: This handbook is a very useful handbook for engineers, especially those working in signal processing, and provides real data bootstrap applications to illustrate the theory covered in the earlier chapters.
Journal ArticleDOI

Physics-informed machine learning

TL;DR: Some of the prevailing trends in embedding physics into machine learning are reviewed, some of the current capabilities and limitations are presented and diverse applications of physics-informed learning both for forward and inverse problems, including discovering hidden physics and tackling high-dimensional problems are discussed.
Journal ArticleDOI

Hidden physics models: Machine learning of nonlinear partial differential equations

TL;DR: In this article, a new paradigm of learning partial differential equations from small data is presented, which is essentially data-efficient learning machines capable of leveraging the underlying laws of physics, expressed by time dependent and nonlinear partial differential equation, to extract patterns from high-dimensional data generated from experiments.
Journal ArticleDOI

Survey of Multifidelity Methods in Uncertainty Propagation, Inference, and Optimization

TL;DR: In many situations across computational science and engineering, multiple computational models are available that describe a system of interest as discussed by the authors, and these different models have varying evaluation costs, i.e.
References
More filters
Journal ArticleDOI

Equation of state calculations by fast computing machines

TL;DR: In this article, a modified Monte Carlo integration over configuration space is used to investigate the properties of a two-dimensional rigid-sphere system with a set of interacting individual molecules, and the results are compared to free volume equations of state and a four-term virial coefficient expansion.
Book

Compressed sensing

TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Journal ArticleDOI

Nonlinear total variation based noise removal algorithms

TL;DR: In this article, a constrained optimization type of numerical algorithm for removing noise from images is presented, where the total variation of the image is minimized subject to constraints involving the statistics of the noise.
Journal ArticleDOI

Monte Carlo Sampling Methods Using Markov Chains and Their Applications

TL;DR: A generalization of the sampling method introduced by Metropolis et al. as mentioned in this paper is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates.
Journal ArticleDOI

Machine learning

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Frequently Asked Questions (11)
Q1. What are the contributions mentioned in the paper "Inverse problems: a bayesian perspective" ?

They are equivalent if and only if the following three conditions hold: ( i ) Im ( C 1 ) = Im ( C 1/2 2 ): = E, ( ii ) m1 −m2 ∈ E, ( iii ) the operator T: = ( C−1/2 1 C 1/2 2 ) ( C−1/2 1 C 1/2 2 ) ∗ − I is Hilbert–Schmidt in E. 

The importance of this class of algorithmsstems from the fact that, in many applications, solutions are required online, with updates required as more data is acquired; thus sequential updating of the posterior measure at the current time is natural. 

Another commonly used method for interrogating a probability measure in high dimensions is sampling : generating a set of points {un}Nn=1 distributed (perhaps only approximately) according to πy(u). 

This will enable us to measuredistance between pairs of probability measures, and is a key ingredient in the definition of well-posed posterior measures described in this article. 

In particular, the rate of decay of the eigenvalues of the covariance operator plays a central role in determining the regularity properties. 

Among the most powerful generic tools for sampling are the Markov chain Monte Carlo (MCMC) methods, which the authors review in Section 5.2. 

That paper contains Theorems 4.1 and 4.2 under Assumptions 2.6 in the case where (i) is satisfied trivially because Φ is bounded from below by a constant; note that this case occurs whenever the data is finite-dimensional. 

The authors haveΓ−1 − (Γ +AC0A∗)−1AC0A∗Γ−1 = (Γ +AC0A∗)−1. (6.16)The formula for the mean derived by completing the square gives m = C ( (C−1 −A∗Γ−1A)m0 +A∗Γ−1y ) = m0 + CA∗Γ−1(y −Am0). 

The third assumption is important for showing that the posterior probability measure is well-defined, whilst the fourth is important for showing continuity with respect to data. 

Thus the Lebesgue density of µ is maximized by minimizing The authorover Rn. Another way of looking at this is as follows: if u is such a minimizer then the probability of a small ball of radius ε and centred at u will be maximized, asymptotically as ε→ 0, by choosing u = u. 

Generalizing the theorems to allow for (i) as stated here was undertaken in Hairer, Stuart and Voss (2010b), in the context of signal processing for stochastic differential equations.