Kernel change point detection: a performance comparison#

Info

Try this notebook in an executable environment with Binder.
Download this notebook here.

Introduction#

In ruptures, there are two ways to perform kernel change point detection:

by using the pure Python classes Dynp (known number of change points) and Pelt (unknown number of change points),
by using the faster class (implemented in C) KernelCPD which contains both the dynamic programming approach and the penalized approach (PELT).

This example illustrates the performance of the fast C implementation compared to the pure Python one.

The kernel change point detection setting is briefly described in the user guide. The interested reader can refer to [Celisse2018, Arlot2019] for a more complete introduction.

The list of available kernels is available here, but in this example we only consider two:

the linear kernel, \(k_{\text{linear}}(x, y) = x^T y\) (Euclidean scalar product) and the induced norm is the Euclidean norm;
the Gaussian kernel (also known as radial basis function, rbf), \(k_{\text{Gaussian}}(x,y)=\exp(-\gamma \|x-y\|^2)\) where \(\|\cdot\|\) is the Euclidean norm and \(\gamma>0\) is a user-defined parameter.

Setup#

First, we make the necessary imports and generate a toy signal

import time  # for execution time comparison

import matplotlib.pyplot as plt  # for display purposes

import ruptures as rpt  # our package
from ruptures.metrics import hausdorff

# generate signal
n_samples, dim, sigma = 500, 3, 3
n_bkps = 6  # number of breakpoints
signal, bkps = rpt.pw_constant(n_samples, dim, n_bkps, noise_std=sigma)
fig, ax_array = rpt.display(signal, bkps)

No description has been provided for this image

Linear kernel#

The linear kernel (see above) \(k_{\text{linear}}\) can detect changes in the mean of a signal. It also corresponds to the cost function CostL2.

Dynamic programming#

When the number of changes to detect is known beforehand, we use dynamic programming.

algo_python = rpt.Dynp(model="l2", jump=1, min_size=2).fit(
    signal
)  # written in pure python
algo_c = rpt.KernelCPD(kernel="linear", min_size=2).fit(signal)  # written in C

for label, algo in zip(
    ("Python implementation", "C implementation"), (algo_python, algo_c)
):
    start_time = time.time()
    result = algo.predict(n_bkps=n_bkps)
    print(f"{label}:\t{time.time() - start_time:.3f} s")

Python implementation:    6.397 s
C implementation:   0.002 s

The speed-up is quite significant and depends on the signal size (number \(T\) of samples and dimension \(d\)) and the number \(K\) of change points to detect. The C implementation has a time complexity of the order \(\mathcal{O}(KdT^2)\) and space complexity of the order \(\mathcal{O}(T)\). As to the Python implementation, the complexities in time and space are of the order \(\mathcal{O}(KdT^3)\) and \(\mathcal{O}(T^2)\) respectively.

We can also check that both methods return the same set of change points.

bkps_python = algo_python.predict(n_bkps=n_bkps)
bkps_c = algo_c.predict(n_bkps=n_bkps)
print(f"Python implementation:\t{bkps_python}")
print(f"C implementation:\t{bkps_c}")
print(f"(Hausdorff distance: {hausdorff(bkps_python, bkps_c):.0f} samples)")

Python implementation:    [73, 147, 221, 288, 354, 426, 500]
C implementation:   [73, 147, 221, 288, 354, 426, 500]
(Hausdorff distance: 0 samples)

PELT#

When the number of changes to detect is unknown, we resort to PELT [Killick2012] to solve the penalized detection problem.

algo_python = rpt.Pelt(model="l2", jump=1, min_size=2).fit(
    signal
)  # written in pure python
algo_c = rpt.KernelCPD(kernel="linear", min_size=2).fit(
    signal
)  # written in C, same class as before


penalty_value = 100  # beta

for label, algo in zip(
    ("Python implementation", "C implementation"), (algo_python, algo_c)
):
    start_time = time.time()
    result = algo.predict(pen=penalty_value)
    print(f"{label}:\t{time.time() - start_time:.3f} s")

Python implementation:    0.424 s
C implementation:   0.000 s

Again, the speed-up is quite significant and depends on the signal size (number \(T\) of samples and dimension \(d\)) and the penalty value \(\beta\). We remark that, for both Python and C implementations, PELT is more efficient then dynamic programming.

We can also check that both methods return the same set of change points.

bkps_python = algo_python.predict(pen=penalty_value)
bkps_c = algo_c.predict(pen=penalty_value)
print(f"Python implementation:\t{bkps_python}")
print(f"C implementation:\t{bkps_c}")
print(f"(Hausdorff distance: {hausdorff(bkps_python, bkps_c):.0f} samples)")

Python implementation:    [5, 73, 147, 221, 288, 354, 426, 500]
C implementation:   [5, 73, 147, 221, 288, 354, 426, 500]
(Hausdorff distance: 0 samples)

Note

By default, Dynp and Pelt has jump=5. In KernelCPD, jump=1 and cannot be changed. This is because, in the C implementation, changing the jump does not improve the running time significatively, while it does in the Python implementation.

Gaussian kernel#

The Gaussian kernel (see above) \(k_{\text{Gaussian}}\) can detect changes in the distribution of an i.i.d. process. This is a feature of several kernel functions (in particular characteristics kernels; see [Gretton2012] for more information). It also corresponds to the cost function CostRbf.

Dynamic programming#

When the number of changes to detect is known beforehand, we use dynamic programming.

params = {"gamma": 1e-2}
algo_python = rpt.Dynp(model="rbf", params=params, jump=1, min_size=2).fit(
    signal
)  # written in pure python
algo_c = rpt.KernelCPD(kernel="rbf", params=params, min_size=2).fit(
    signal
)  # written in C

for label, algo in zip(
    ("Python implementation", "C implementation"), (algo_python, algo_c)
):
    start_time = time.time()
    result = algo.predict(n_bkps=n_bkps)
    print(f"{label}:\t{time.time() - start_time:.3f} s")

Python implementation:    5.750 s
C implementation:   0.003 s

Again, the speed-up is quite significant. The C implementation has a time complexity of the order \(\mathcal{O}(CKT^2)\) and space complexity of the order \(\mathcal{O}(T)\), where \(C\) is the complexity of computing \(k(y_s, y_t)\) once. As to the Python implementation, the complexities in time and space are of the order \(\mathcal{O}(CKT^4)\) and \(\mathcal{O}(T^2)\) respectively.

We can also check that both methods return the same set of change points.

bkps_python = algo_python.predict(n_bkps=n_bkps)
bkps_c = algo_c.predict(n_bkps=n_bkps)
print(f"Python implementation:\t{bkps_python}")
print(f"C implementation:\t{bkps_c}")
print(f"(Hausdorff distance: {hausdorff(bkps_python, bkps_c):.0f} samples)")

Python implementation:    [73, 147, 221, 288, 354, 426, 500]
C implementation:   [73, 147, 221, 288, 354, 426, 500]
(Hausdorff distance: 0 samples)

Note

If not provided by the user, the gamma parameter is chosen using the median heuristics, meaning that it is set to inverse of the median of all pairwise products \(k(y_s, y_t)\).

PELT#

When the number of changes to detect is unknown, we resort to PELT to solve the penalized detection problem.

algo_python = rpt.Pelt(model="rbf", jump=1, min_size=2).fit(
    signal
)  # written in pure python
algo_c = rpt.KernelCPD(kernel="rbf", min_size=2).fit(
    signal
)  # written in C, same class as before


penalty_value = 1  # beta

for label, algo in zip(
    ("Python implementation", "C implementation"), (algo_python, algo_c)
):
    start_time = time.time()
    result = algo.predict(pen=penalty_value)
    print(f"{label}:\t{time.time() - start_time:.3f} s")

Python implementation:    0.214 s
C implementation:   0.000 s

Again, the speed-up is quite significant and depends on the signal size (number \(T\) of samples and dimension \(d\)) and the penalty value \(\beta\). We remark that, for both Python and C implementations, PELT is more efficient then dynamic programming.

We can also check that both methods return the same set of change points.

bkps_python = algo_python.predict(pen=penalty_value)
bkps_c = algo_c.predict(pen=penalty_value)
print(f"Python implementation:\t{bkps_python}")
print(f"C implementation:\t{bkps_c}")
print(f"(Hausdorff distance: {hausdorff(bkps_python, bkps_c):.0f} samples)")

Python implementation:    [73, 147, 221, 288, 354, 426, 500]
C implementation:   [73, 147, 221, 288, 354, 426, 500]
(Hausdorff distance: 0 samples)

References#

[Gretton2012] Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. The Journal of Machine Learning Research, 13, 723–773.

[Killick2012] Killick, R., Fearnhead, P., & Eckley, I. (2012). Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500), 1590–1598.

[Celisse2018] Celisse, A., Marot, G., Pierre-Jean, M., & Rigaill, G. (2018). New efficient algorithms for multiple change-point detection with reproducing kernels. Computational Statistics and Data Analysis, 128, 200–220.

[Arlot2019] Arlot, S., Celisse, A., & Harchaoui, Z. (2019). A kernel multiple change-point algorithm via model selection. Journal of Machine Learning Research, 20(162), 1–56.