Detection of Multiple Change–Points in
Multivariate Time Series
Marc Lavielle
Universit´e Ren´e Descartes and Universit´e Paris–Sud,
Laboratoire de Math´ematiques
Marc.Lavielle@math.u-psud.fr
Gilles Teyssi
`
ere
Statistique Appliqu´ee et MOd´elisation Stochastique
CES, Universit´e Paris 1 Panth´eon–Sorbonne.
stats@gillesteyssiere.net
July 2006
To appear in the Lithuanian Mathematical Journal, vol46, 2006
Abstract
We consider the multiple change–point p roblem for multivariate time series, including strongly
dependent processes, with an unknown number of change–points. We assume that the covariance
structure of the series changes abruptly at some unknown common change–point times. The proposed
adaptive method is able to detect changes in mu ltivariate i.i.d., weakly and strongly dependent series.
This adaptive method outperforms the Schwarz criteria, mainly for the case of weakly dependent data.
We consider applications to multivariate series of daily stock indices returns and series generated by
an artificial financial market.
1 Introduction
Detecting changes in multivariate time series is of interest if we be lieve that these series are correlated,
and/or that the components of the multivariate vector processes ar e ge nerated by the same process.
This assumption is relevant for financial markets where correlated assets are traded. Empirical evidence,
reported e.g ., in Teyssi`ere [36, 37], shows that several time series , i.e., Foreign Exchange (FX) rates
returns, display the same degree of persistence in their volatilities and co–volatilities, a property that
might be caused by a common non–stationarity of these s e ries. The presence of s trong dependence in asset
price volatilities is still a matter of debate, although numerous works, see e.g., Mikosch and St˘aric˘a [33],
Kokoszka and Teyssi`ere [26], Lavielle and Teyssi`ere [30], have shown that the strong persistence in
volatility is likely to be a statistical artefact, i.e., mainly an effect of the concatenation of processes with
different unconditional varia nce s; see also Giraitis et al. [14] for a survey on volatility models.
From the point of view of the practitioner, change–po ints detection procedures are of interest, as we
do not know which process does actually generate the data under investigation. Furthermore, economic
1
data are usually not stationary, and then it may be of interest to approximate an unknown and possibly
nonstationary proce ss with locally stationary proce sses; see e.g., Dalhaus [12].
The literature on change-point detection is ra ther huge: reference monographs include Bass e ville and
Nikiforov [1], Brodsky and Darkhovsky [7], Cs¨org¨o and Horv´ath [11], Chen and Gupta [8]. The journal
articles by Giraitis and Leipus [16, 15], Hawkins [19, 20], Chen and Gupta [9], Mia and Zhao [32], Sen
and Srivastava [35] among others, ar e also of interest.
The statistical theory for weakly dependent vo latility proce sses with a change–point was developed
more recently; see, e.g., Chu [10], Kokoszka and Leipus [24, 25], Horv´ath, Kokoszka and Teyssi`ere [21],
Kokoszka and Teyssi`ere [26], Berkes et al. [2]. The processes considered in these works are no longer
i.i.d., but weakly dependent. For the case of strongly dependent time series, the reader is referred to the
paper by Giraitis, Leipus and Surgailis [17], Lavielle [27], and the chapter by Kokoszka and Leipus [23]
in the book on long-r ange dependence edited by Doukhan et al. (2003), which rev iew s the recent works
on the issue of change–point detection for univariate dependent time series.
The occurrence of a single change –point in real data is rather rare, as data in economics, finance, hy-
drology, biology, electrical e ngineering, etc., display multiple changes, see e.g., Schechtman and Wolfe [34],
Braun et al. [6], Lavielle and Moulines [29], Lavielle and Teyssi`er e [30]. Thus, a statistical procedure able
to reliably detect multiple changes is of practical interest. It has be e n often claimed that the testing proce-
dure for single change–point can be extended to the multiple change–point case by using Vostrikova’s [39]
binary segmentation procedure, which consists in applying the single change–point detection procedure
on the whole sample, split the sample at the detected change–point, and then apply iteratively the
change–point detection proc edure on the resulting two segments until no further change–point is found.
In Lavielle and Teyssi`ere [30], we addressed the issue of globa l procedure vs local procedure, and
found that the extension of s ingle change–points procedures to the case of multiple change–point using
Vostrikova’s [39] bina ry segmentation procedure is misleading and y ields an overestimation of the number
of change –points.
A glo bal approach means that all the change–points are simultaneously detected. These change–
points are estimated by minimizing a penalized contrast J(τ , y)+βpen(τ ) (see [3, 27, 40]). Here, J(τ , y)
measures how the model obtained with the change–points se quence τ fits the observed series y. Its role
is to locate the change–points as accurately as possible. For detecting changes in the mean and/ or the
covariance matrix of a multivariate series, we propose to define the contrast J(τ , y) from the logarithm
of a Gaussian likelihood, e ven if the observed series is not Gaussian. The penalty term pen(τ ) only
depends on the dimension K(τ ) of the model τ a nd increases with K(τ ). The penalization parameter
β adjusts the trade-off between the minimization of J(τ , y) (obtained with a high dimension of τ ), and
the minimization o f pen(τ ) (obtained with a small dimension o f τ ).
Asymptotic r e sults have be e n obtained in theoretical general contexts in [27], extending the previous
results of Yao [40]. We shall see that this approach is also very us e ful for practical applications, for
detecting changes in the mean and/or variance of multivariate time series, with the restriction that the
series have a common segmentation τ . An adaptive method is proposed for e stimating the number of
change–points. Numerical experiments show tha t the proposed method outperforms the Schwarz criterion
and yields very good results.
For a multivariate time series, the algorithm of the detection procedure will be of order O(mn
2
),
where m is the dimension of the vecto r process, instead of the O(n
2
) order as in the univariate case.
2
2 A penalized contrast estimate for the multivariate change–
point problem
2.1 The contrast function
We assume that the m–dimensional pro c ess {Y
t
= (Y
1,t
, . . . , Y
m,t
)
′
} is abruptly changing and is char-
acterized by a parameter θ ∈ Θ that remains constant b etween two changes. We w ill strongly use this
assumption to define our contras t function J(τ , Y ).
Let K be so me integer and let τ = {τ
1
, τ
2
, . . . , τ
K−1
} be an ordered sequence of integers satisfying
0 < τ
1
< τ
2
< . . . < τ
K−1
< n. For any 1 6 k 6 K, let U(Y
τ
k−1
+1
, . . . , Y
τ
k
; θ) be a contrast function
useful for estimating the unknown true value of the parameter in the segment k. In o ther wo rds, the
minimum contrast estimate
ˆ
θ(Y
τ
k−1
+1
, . . . , Y
τ
k
), c omputed on the k
th
segment of τ , is defined as a
solution to the following minimization problem:
U
Y
τ
k−1
+1
, . . . , Y
τ
k
;
ˆ
θ(Y
τ
k−1
+1
, . . . , Y
τ
k
)
6 U(Y
τ
k−1
+1
, . . . , Y
τ
k
; θ) , ∀θ ∈ Θ. (1)
For any 1 6 k 6 K, let G be defined as
G(Y
τ
k−1
+1
, . . . , Y
τ
k
) = U
Y
τ
k−1
+1
, . . . , Y
τ
k
;
ˆ
θ(Y
τ
k−1
+1
, . . . , Y
τ
k
)
. (2)
Then, define the contrast function J(τ , Y ) as
J(τ , Y ) =
1
n
K
X
k=1
G(Y
τ
k−1
+1
, . . . , Y
τ
k
), (3)
where τ
0
= 0 and τ
K
= n.
We consider in this paper changes in the covariance matrix of the sequence {Y
t
}. More precisely,
we assume that there exists an integer K
⋆
, a sequence τ
⋆
= {τ
⋆
1
, τ
⋆
2
, . . . , τ
⋆
K
⋆
} with τ
⋆
0
= 0 < τ
⋆
1
<
... < τ
⋆
K
⋆
−1
< τ
⋆
K
⋆
= n and K
⋆
(m × m) covariance matrices Σ
1
, Σ
2
, . . . , Σ
K
⋆
such that Cov (Y
t
) =
E(Y
t
− E (Y
t
))(Y
t
− E (Y
t
))
′
= Σ
k
for τ
⋆
k−1
+ 1 6 t 6 τ
⋆
k
.
Model M1: There exist a m-vector µ such that E (Y
t
) = µ for t = 1, 2, ..., n. Furthermore, Σ
k
6= Σ
k+1
for 1 6 k 6 K
⋆
− 1.
For this simple case of changes in the covariance matr ix without changes in the mean, which is of
intere st for multivariate volatility processes, the following contr ast function, based on a Gaussian log–
likelihood function, can be used:
J(τ , Y ) =
1
n
K
X
k=1
n
k
log |
b
Σ
τ
k
|, (4)
where n
k
= τ
k
− τ
k−1
is the length of the segment k,
b
Σ
τ
k
is the (m × m) empir ical covariance matrix
computed on that segment k:
b
Σ
τ
k
=
1
n
k
τ
k
X
t=τ
k−1
+1
(Y
t
−
¯
Y )(Y
t
−
¯
Y )
′
. (5)
Here
¯
Y = n
−1
P
n
t=1
Y
t
is the empirical mean of the m–dimensional series Y
t
computed on the co mplete
series.
3
Model M2: There exist K
⋆
m-vectors µ
1
, . . . µ
K
⋆
such that E (Y
t
) = µ
k
for τ
⋆
k−1
+ 1 6 t 6 τ
⋆
k
.
Furthermore, (µ
k
, Σ
k
) 6= (µ
k+1
, Σ
k+1
) for 1 6 k 6 K
⋆
− 1.
For the detection of changes in the mean vector and/or the covariance matrix of a multivariate
sequence of random variables, this contrast also reduces to
J(τ , Y ) =
1
n
K
X
k=1
n
k
log |
b
Σ
τ
k
| (6)
but the (m × m) empirical covariance matrix
b
Σ
τ
k
is c omputed on segment k as
b
Σ
τ
k
=
1
n
k
τ
k
X
t=τ
k−1
+1
(Y
t
−
¯
Y
τ
k
)(Y
t
−
¯
Y
τ
k
)
′
(7)
where
¯
Y
τ
k
= n
−1
k
P
τ
k
t=τ
k−1
+1
Y
t
is the empirical mean of the m–dimensional s e ries Y
t
computed on tha t
segment.
Asymptotic results for the minimum contrast estimate of τ
⋆
can be obtained within the following
asymptotic framework:
A1 For any 1 6 i 6 m and any 1 6 t ≤ n , define η
t,i
= Y
t,i
−E (Y
t,i
). There exists C > 0 and 1 6 h < 2
such that for any u ≥ 0 and any s ≥ 1,
E
u+s
X
t=u+1
η
t,i
!
2
6 C(θ)s
h
. (8)
(A1 holds with h = 1 for weakly dependent sequences and 1 < h < 2 for strongly dependent sequences)
A2 There exists a sequence 0 < a
1
< a
2
< . . . < a
K
⋆
−1
< a
K
⋆
= 1 such that for any n > 1 and for any
1 6 k 6 K
⋆
− 1, τ
⋆
k
= [na
k
].
When the true number K
⋆
of segments is known, we have the following result concerning the r ate of
convergence o f the minimum contrast estimator of τ
⋆
:
Theorem 2.1 Assume that conditions A1-A2 are satisfied. Under model M1 (resp. model M2), let
ˆτ
n
be the time instants that minimize the empirical contrast J(τ , Y ) defined in (4) (resp. (6)). Then,
the sequence {nkˆτ
n
− τ
⋆
k
∞
} is uniformly tight in probability:
lim
n→∞
lim
δ→∞
P( max
16k6K
⋆
−1
|ˆτ
n,k
− τ
⋆
k
| > δ) = 0. (9)
(Here, J(τ , Y ) is minimized over all possible sequences τ of length K
⋆
)
Proof: The proof is a direct application of Theorem 2.4 by Lavielle [27]. We can easily check that
hypotheses H1-H2 of [2 7] are satisfied under models M1 and M2 and under hypotheses A1-A2.
This result means that the rate of convergence of ˆτ
n
does not depend on the covariance structure
of the sequence {Y
t
}. For strongly mixing sequences, as well as for strongly dependent s equences, the
optimal rate is obtained since kˆτ
n
− τ
⋆
k
∞
= O
P
(1).
4
2.2 Penalty functions for the change–point problem
When the number of change–points is unknown, we estimate it by minimizing a penalized version of
the function J(τ , Y ). For any sequence of change–point instants τ , let pen(τ ) be a function o f τ that
increases with the number K(τ ) of segments of τ . Then, let {ˆτ
n
} be the sequence of change–point
instants that minimizes
U(τ ) = J(τ , Y ) + βpen(τ ). (10)
The procedure is intuitively simple: the adjustment criteria must be compensated so that the over-
segmentation would be penalized. However, this comp ensation must not be very important as a too large
penalty function yields an underestimation of the number of segments.
If β is a function of n that goes to 0 at an appropriate rate as n goes to infinity, the following theorem
states that the estimated number of segments c onverges in probability to K
⋆
and tha t (9) still holds.
Theorem 2.2 Let {β
n
} be a positive sequence of real numbers such that
β
n
−→
n→∞
0 and n
2−h
β
n
−→
n→∞
∞, 1 6 h < 2. (11)
Then, under A1-A2, the estimated number of segments K(ˆτ
n
), where ˆτ
n
is the minimum penalized
contrast estimate of τ
⋆
obtained by minimizing J(τ , Y ) + β
n
pen(τ ), converges in probability to K
⋆
.
(Here, J(τ , Y ) is m inimized over all possible sequences τ and over all possible 1 ≤ K ≤ K
max
, where
K
max
is some known upperbound of K
⋆
)
Proof: the proof is a direct application of Theorem 3.1 by Lavielle [27].
In practice, asymptotic results are not very useful for selecting the penalty term βpen(τ ). Indeed,
given a real observed signal with a fixed and finite length n, the para meter β must be fixed to some
arbitrary value. When the parameter β is chosen to be very large, only the more significant abrupt changes
are detected. However, a small value of β produces a high number of estimated changes. Therefo re, a
trade-off must be made, i.e., we have to select a value of β which yields a reasonable level of resolution
in the segmentation.
Various authors suggest different penalty functions according to the model they consider. For example,
the Schwarz criterion is used by Braun et al. [6] for detecting changes in a DNA sequence.
Consider first the p enalty function pen(τ ). By definition, pen(τ ) should increase with the numb e r
of segments K(τ ). Following the most popular information c riteria such the AIC and the Schwarz
criteria, the simplest penalty function pen(τ ) = K(τ ) can be used. Furthermore, Ya o [40] has proved
the consistency of the Schwarz criterion for some models.
Remark 2.3 For the multivariate i.i.d. case, the penalization parameter for the Schwarz criterion is
β =
m(m + 1)
2
log n
n
. (12)
In order to reduce the computational cost of the algorithm and according to the required precision in the
estimation, the change-points can be detected on a sub-grid d, 2d, 3d, . . . of 1, 2, . . . , n (we used d = 10 in
5