# Kernel change point detection#

## Problem formulation#

In this section, the kernel change point detection setting is briefly described. The interested reader can refer to [Celisse2018, Arlot2019] for a more complete introduction.
Let $$y = \{y_0,y_1,\dots,y_{T-1}\}$$ denote a $$\mathbb{R}^d$$-valued signal with $$T$$ samples. This signal is mapped onto a reproducing Hilbert space (rkhs) $$\mathcal{H}$$ associated with a user-defined kernel function $$k(\cdot, \cdot):\mathbb{R}^d\times\mathbb{R}^d\rightarrow\mathbb{R}$$. The mapping function $$\phi:\mathbb{R}^d\rightarrow\mathcal{H}$$ onto this rkhs is implicitly defined by $$\phi(y_t) = k(y_t, \cdot)\in\mathcal{H}$$ resulting in the following inner-product and norm:

$\langle\phi(y_s)\mid\phi(y_t)\rangle_{\mathcal{H}} = k(y_s,y_t)$

and

$\|\phi(y_t)\|_{\mathcal{H}}^2 = k(y_t,y_t)$

for any samples $$y_s,y_t\in\mathbb{R}^d$$. Kernel change point detection consists in finding mean-shifts in the mapped signal $$\phi(y)$$ by minimizing $$V(\cdot)$$ where

$V(t_1,\dots,t_K) := \sum_{k=0}^K\sum_{t=t_k}^{t_{k+1}-1} \|\phi(y_t)-\bar{\mu}_{t_k..t_{k+1}}\|^2_{\mathcal{H}}$

where $$\bar{\mu}_{t_k..t_{k+1}}$$ is the empirical mean of the sub-signal $$\phi(y_{t_k}), \phi(y_{t_k+1}),\dots,\phi(y_{t_{k+1}-1})$$, and $$t_1,t_2,\dots,t_K$$ are change point indexes, in increasing order. (By convention $$t_0=0$$ and $$t_{K+1}=T$$.)

If the number of changes is known beforehand, we solve the following optimization problem, over all possible change positions $$t_1<t_2<\dots<t_K$$ (where the number $$K$$ of changes is provided by the user):

$\hat{t}_1,\dots,\hat{t}_K := \arg\min_{t_1,\dots,t_K} V(t_1,\dots,t_K).$

The exact optimization procedure is described in [Celisse2018].

If the number of changes is not known, we solve the following penalized optimization problem

$\hat{K}, \{\hat{t}_1,\dots,\hat{t}_{\hat{K}}\} := \arg\min_{K, \{t_1,\dots, t_K\}} V(t_1,\dots, t_K) + \beta K$

where $$\beta>0$$ is the smoothing parameter (provided by the user) and $$\hat{K}$$ is the estimated number of change points. Higher values of $$\beta$$ produce lower $$\hat{K}$$. The exact optimization procedure is described in [Killick2012].

## Available kernels#

We list below a number of kernels that are already implemented in ruptures. In the following, $$u$$ and $$v$$ are two d-dimensional vectors and $$\|\cdot\|$$ is the Euclidean norm.

Kernel Description Cost function
Linear
model="linear"
$$k_{\text{linear}}(u, v) = u^T v$$. CostL2
Gaussian
model="rbf"
$$k_{\text{Gaussian}}(u,v)=\exp(-\gamma \|u-v\|^2)$$
where $$\gamma>0$$ is a user-defined parameter.
CostRbf
Cosine
model="cosine"
$$k_{\text{cosine}}(u, v) = (u^T v)/(\|u\|\|v\|)$$ CostCosine

## Implementation and usage#

Kernel change point detection is implemented in the class KernelCPD, which is a C implementation of dynamic programming and PELT. To see it in action, please look at the gallery of examples, in particular:

The exact class API is available here.

## References#

[Gretton2012] Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. The Journal of Machine Learning Research, 13, 723–773.

[Killick2012] Killick, R., Fearnhead, P., & Eckley, I. (2012). Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500), 1590–1598.

[Celisse2018] Celisse, A., Marot, G., Pierre-Jean, M., & Rigaill, G. (2018). New efficient algorithms for multiple change-point detection with reproducing kernels. Computational Statistics and Data Analysis, 128, 200–220.

[Arlot2019] Arlot, S., Celisse, A., & Harchaoui, Z. (2019). A kernel multiple change-point algorithm via model selection. Journal of Machine Learning Research, 20(162), 1–56.