# Kernel change point detection: a performance comparison#

Info

• Try this notebook in an executable environment with Binder.

## Introduction#

In ruptures, there are two ways to perform kernel change point detection:

• by using the pure Python classes Dynp (known number of change points) and Pelt (unknown number of change points),

• by using the faster class (implemented in C) KernelCPD which contains both the dynamic programming approach and the penalized approach (PELT).

This example illustrates the performance of the fast C implementation compared to the pure Python one.

The kernel change point detection setting is briefly described in the user guide. The interested reader can refer to [Celisse2018, Arlot2019] for a more complete introduction.

The list of available kernels is available here, but in this example we only consider two:

• the linear kernel, $$k_{\text{linear}}(x, y) = x^T y$$ (Euclidean scalar product) and the induced norm is the Euclidean norm;
• the Gaussian kernel (also known as radial basis function, rbf), $$k_{\text{Gaussian}}(x,y)=\exp(-\gamma \|x-y\|^2)$$ where $$\|\cdot\|$$ is the Euclidean norm and $$\gamma>0$$ is a user-defined parameter.

## Setup#

First, we make the necessary imports and generate a toy signal

import time  # for execution time comparison

import matplotlib.pyplot as plt  # for display purposes

import ruptures as rpt  # our package
from ruptures.metrics import hausdorff

# generate signal
n_samples, dim, sigma = 500, 3, 3
n_bkps = 6  # number of breakpoints
signal, bkps = rpt.pw_constant(n_samples, dim, n_bkps, noise_std=sigma)
fig, ax_array = rpt.display(signal, bkps)


## Linear kernel#

The linear kernel (see above) $$k_{\text{linear}}$$ can detect changes in the mean of a signal. It also corresponds to the cost function CostL2.

### Dynamic programming#

When the number of changes to detect is known beforehand, we use dynamic programming.

algo_python = rpt.Dynp(model="l2", jump=1, min_size=2).fit(
signal
)  # written in pure python
algo_c = rpt.KernelCPD(kernel="linear", min_size=2).fit(signal)  # written in C

for (label, algo) in zip(
("Python implementation", "C implementation"), (algo_python, algo_c)
):
start_time = time.time()
result = algo.predict(n_bkps=n_bkps)
print(f"{label}:\t{time.time() - start_time:.3f} s")

Python implementation:    6.770 s
C implementation:   0.008 s



The speed-up is quite significant and depends on the signal size (number $$T$$ of samples and dimension $$d$$) and the number $$K$$ of change points to detect. The C implementation has a time complexity of the order $$\mathcal{O}(KdT^2)$$ and space complexity of the order $$\mathcal{O}(T)$$. As to the Python implementation, the complexities in time and space are of the order $$\mathcal{O}(KdT^3)$$ and $$\mathcal{O}(T^2)$$ respectively.

We can also check that both methods return the same set of change points.

bkps_python = algo_python.predict(n_bkps=n_bkps)
bkps_c = algo_c.predict(n_bkps=n_bkps)
print(f"Python implementation:\t{bkps_python}")
print(f"C implementation:\t{bkps_c}")
print(f"(Hausdorff distance: {hausdorff(bkps_python, bkps_c):.0f} samples)")

Python implementation:    [71, 147, 218, 294, 364, 433, 500]
C implementation:   [71, 147, 218, 294, 364, 433, 500]
(Hausdorff distance: 0 samples)



### PELT#

When the number of changes to detect is unknown, we resort to PELT [Killick2012] to solve the penalized detection problem.

algo_python = rpt.Pelt(model="l2", jump=1, min_size=2).fit(
signal
)  # written in pure python
algo_c = rpt.KernelCPD(kernel="linear", min_size=2).fit(
signal
)  # written in C, same class as before

penalty_value = 100  # beta

for (label, algo) in zip(
("Python implementation", "C implementation"), (algo_python, algo_c)
):
start_time = time.time()
result = algo.predict(pen=penalty_value)
print(f"{label}:\t{time.time() - start_time:.3f} s")

Python implementation:    0.508 s
C implementation:   0.001 s



Again, the speed-up is quite significant and depends on the signal size (number $$T$$ of samples and dimension $$d$$) and the penalty value $$\beta$$. We remark that, for both Python and C implementations, PELT is more efficient then dynamic programming.

We can also check that both methods return the same set of change points.

bkps_python = algo_python.predict(pen=penalty_value)
bkps_c = algo_c.predict(pen=penalty_value)
print(f"Python implementation:\t{bkps_python}")
print(f"C implementation:\t{bkps_c}")
print(f"(Hausdorff distance: {hausdorff(bkps_python, bkps_c):.0f} samples)")

Python implementation:    [71, 147, 153, 219, 222, 294, 364, 433, 500]
C implementation:   [71, 147, 153, 219, 222, 294, 364, 433, 500]
(Hausdorff distance: 0 samples)



Note

By default, Dynp and Pelt has jump=5. In KernelCPD, jump=1 and cannot be changed. This is because, in the C implementation, changing the jump does not improve the running time significatively, while it does in the Python implementation.

## Gaussian kernel#

The Gaussian kernel (see above) $$k_{\text{Gaussian}}$$ can detect changes in the distribution of an i.i.d. process. This is a feature of several kernel functions (in particular characteristics kernels; see [Gretton2012] for more information). It also corresponds to the cost function CostRbf.

### Dynamic programming#

When the number of changes to detect is known beforehand, we use dynamic programming.

params = {"gamma": 1e-2}
algo_python = rpt.Dynp(model="rbf", params=params, jump=1, min_size=2).fit(
signal
)  # written in pure python
algo_c = rpt.KernelCPD(kernel="rbf", params=params, min_size=2).fit(
signal
)  # written in C

for (label, algo) in zip(
("Python implementation", "C implementation"), (algo_python, algo_c)
):
start_time = time.time()
result = algo.predict(n_bkps=n_bkps)
print(f"{label}:\t{time.time() - start_time:.3f} s")

Python implementation:    7.679 s
C implementation:   0.014 s



Again, the speed-up is quite significant. The C implementation has a time complexity of the order $$\mathcal{O}(CKT^2)$$ and space complexity of the order $$\mathcal{O}(T)$$, where $$C$$ is the complexity of computing $$k(y_s, y_t)$$ once. As to the Python implementation, the complexities in time and space are of the order $$\mathcal{O}(CKT^4)$$ and $$\mathcal{O}(T^2)$$ respectively.

We can also check that both methods return the same set of change points.

bkps_python = algo_python.predict(n_bkps=n_bkps)
bkps_c = algo_c.predict(n_bkps=n_bkps)
print(f"Python implementation:\t{bkps_python}")
print(f"C implementation:\t{bkps_c}")
print(f"(Hausdorff distance: {hausdorff(bkps_python, bkps_c):.0f} samples)")

Python implementation:    [71, 147, 218, 294, 364, 433, 500]
C implementation:   [71, 147, 218, 294, 364, 433, 500]
(Hausdorff distance: 0 samples)



Note

If not provided by the user, the gamma parameter is chosen using the median heuristics, meaning that it is set to inverse of the median of all pairwise products $$k(y_s, y_t)$$.

### PELT#

When the number of changes to detect is unknown, we resort to PELT to solve the penalized detection problem.

algo_python = rpt.Pelt(model="rbf", jump=1, min_size=2).fit(
signal
)  # written in pure python
algo_c = rpt.KernelCPD(kernel="rbf", min_size=2).fit(
signal
)  # written in C, same class as before

penalty_value = 1  # beta

for (label, algo) in zip(
("Python implementation", "C implementation"), (algo_python, algo_c)
):
start_time = time.time()
result = algo.predict(pen=penalty_value)
print(f"{label}:\t{time.time() - start_time:.3f} s")

Python implementation:    0.277 s
C implementation:   0.002 s



Again, the speed-up is quite significant and depends on the signal size (number $$T$$ of samples and dimension $$d$$) and the penalty value $$\beta$$. We remark that, for both Python and C implementations, PELT is more efficient then dynamic programming.

We can also check that both methods return the same set of change points.

bkps_python = algo_python.predict(pen=penalty_value)
bkps_c = algo_c.predict(pen=penalty_value)
print(f"Python implementation:\t{bkps_python}")
print(f"C implementation:\t{bkps_c}")
print(f"(Hausdorff distance: {hausdorff(bkps_python, bkps_c):.0f} samples)")

Python implementation:    [71, 147, 218, 294, 364, 433, 500]
C implementation:   [71, 147, 218, 294, 364, 433, 500]
(Hausdorff distance: 0 samples)



## References#

[Gretton2012] Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. The Journal of Machine Learning Research, 13, 723–773.

[Killick2012] Killick, R., Fearnhead, P., & Eckley, I. (2012). Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500), 1590–1598.

[Celisse2018] Celisse, A., Marot, G., Pierre-Jean, M., & Rigaill, G. (2018). New efficient algorithms for multiple change-point detection with reproducing kernels. Computational Statistics and Data Analysis, 128, 200–220.

[Arlot2019] Arlot, S., Celisse, A., & Harchaoui, Z. (2019). A kernel multiple change-point algorithm via model selection. Journal of Machine Learning Research, 20(162), 1–56.