Abstract

Changepoint detection is a central problem in time series and genomic data. For some applications, it is natural to impose constraints on the directions of changes. One example is ChIP-seq data, for which adding an up-down constraint improves peak detection accuracy, but makes the optimization problem more complicated.

In response to this, a new algorithm which can solve problems with arbitrary affine constraints on adjacent segment means was created, which shows that a functional pruning technique can be adapted to solve such constrained changepoint detection problems.

Going by the name of PeakSegPDPA, this algorithm computes optimal changepoint models using the Poisson likelihood for non-negative count data, with subject to the PeakSeg constraint (first change must be up, second must be down and subsequent changes must be alternating in that manner so forth). It achieves state-of-the-art accuracy in a benchmark of several genomic data sets, and is orders of magnitude faster than existing algorithms that have similar accuracy.

History

PeakSegOptimal was previously named as coseg, (as can be found in research papers and articles referring the same) but due to the existence of the CoSeg package (which is completely unrelated to the functionality of PeakSegOptimal) it was renamed, in order to avoid confusion.

Time Complexity

The time complexity is log-linear in the amount of data (O(NlogN) for N), as empirically observed.

Reference

Source for extracting the above information includes the github repository and this publication.

Testing

The fact that it achieves log-linear computational complexity can be attested/supported by the results of running testComplexity functions to root out its time complexity:

library(testComplexity)
data.vec <- rpois(N, 1)
df <- asymptoticTimings(PeakSegOptimal::PeakSegPDPA(data.vec, max.segments = 3L), data.sizes = 10^seq(1, 4, by = 0.5))
asymptoticTimeComplexityClass(df)
#> [1] "loglinear"

Created on 2020-08-17 by the reprex package (v0.3.0)