Abstract

The changepoint package implements three algorithms for multiple changepoint detection, out of which we will be discussing two of them with regards to the computational time complexity:

  • Segment Neighbourhood (SegNeigh) algorithm: This algorithm uses a dynamic programming technique to obtain the optimal segmentation for n changepoints, reusing the information that was calculated for (n-1) changepoints. This reduces the algorithmic complexity to quadratic or O(n2), from the exponential time complexity (O(2n)) given by the Binary Segmentation algorithm (not discussed here).
  • Pruned Exact Linear Time (PELT) algorithm: This algorithm uses dynamic programming and pruning as well, which is computationally more efficient and can result in a linear or O(n) search algorithm, with subject to certain assumptions being satisfied, out of which the main assumption that controls the computational time is that the number of changepoints increases linearly as the data set grows. (i.e. when the changepoints are spread throughout the data rather than being confined to one portion)

Reference

The pdf used as reference for obtaining the above information can be found here.

Testing

For testing the complexities, we will be using the function cpt.mean() which calculates the optimal positioning and number of changepoints for the provided data (will use rnorm()) with the user-specified method.
Note that cpt here refers to the object class introduced by the changepoint package to store changepoint analysis objects.

As expected, using the testComplexity functions with suitable parameters show the specified complexities for the PELT (linear time) and SegNeigh (quadratic time) algorithms for changepoint::cpt.mean():

library(testComplexity)
# PELT case:
df <- asymptoticTimings(changepoint::cpt.mean(rnorm(N), method = "PELT"), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 0.1)
asymptoticTimeComplexityClass(df)
#> [1] "linear"
# SegNeigh case:
df <- asymptoticTimings(changepoint::cpt.mean(rnorm(N), penalty = "Asymptotic", pen.value = 0.01, method = "SegNeigh"), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 0.1)
asymptoticTimeComplexityClass(df)
#> [1] "quadratic"

Created on 2020-08-17 by the reprex package (v0.3.0)