Test functions

20 May 2020

Reading time ~2 minutes
Here’s a list of complexity classes which in turn are lists of functions with corresponding complexity, to be used as test cases passed onto testComplexity functions:
# packages:             # functions:
library(PeakSegDP)      # cDPA()
library(PeakSegDisk)    # PeakSegFPOP_vec()
library(PeakSegOptimal) # PeakSegPDPA(), PeakSegFPOP()
library(fpop)           # fpop()
library(gfpop)          # gfpop()
library(opart)          # opart_gaussian()
library(changepoint)    # cpt.mean()

expected.funs <- list(
                      #-------------------------------------------------------------------------------------------------------------
                      # Complexity class A : Linear
                      linear = list(

                                    # A.1: The substring function. (retrieve/replace substring in a string)
                                    # Case: Expected linear time complexity, but found quadratic for this case: (bug)
                                    # reference case: https://stat.ethz.ch/pipermail/r-devel/2019-February/077318.html
                                    substring = function(N){
                                       substring(paste(rep("A", N), collapse = ""), 1:N, 1:N)
                                    },

                                    # A.2: The gregexpr function. (global regexpr, giving all matches in a string)
                                    # Case: Similar to substring (in terms of complexity) and strsplit(), expected linear but is quadratic
                                    # reference case: https://stat.ethz.ch/pipermail/r-devel/2017-January/073577.html
                                    gregexpr = function(N){
                                       gregexpr("a", paste(collapse = "", rep("ab", N)), perl = TRUE)
                                    },
                                    # Note: Both substring and gregexpr were quadratic in R-3.5.2 but are linear starting in R-3.6.0.
                                    # reference: https://github.com/tdhock/namedCapture-article#6-mar-2019

                                    # A.3: The cpt.mean function using PELT method:
                                    changepoint = function(N){

                                       changepoint::cpt.mean(rnorm(N), method = "PELT") # additional parameters: penalty, pen.value etc.
                                    }
                                   ),
                      #-------------------------------------------------------------------------------------------------------------
                      # Complexity class B : Loglinear
                      loglinear = list(

                                     # B.1: The fpop function.
                                     # reference : https://arxiv.org/pdf/1810.00117.pdf
                                     fpop = function(N){
                                        fpop::Fpop(rnorm(N), 1)
                                     },

                                     # B.2: The gfpop function. (generalized fpop)
                                     # reference: https://arxiv.org/abs/1810.00117
                                     gfpop = function(N){
                                        gfpop(data = dataGenerator(as.integer(N), c(0.1, 0.2, 0.3, 0.4, 0.6, 0.8, 1), c(0, 0.5, 1, 1.5, 2, 2.5, 3), sigma = 1), mygraph = graph(penalty = 2*log(as.integer(N)), type = "isotonic"), type = "mean")
                                     },

                                     # B.3: The PeakSegPDPA function.
                                     PeakSegPDPA = function(N){
                                        PeakSegOptimal::PeakSegPDPA(rpois(N, 1),rep(1, length(rpois(N, 1))), 3L) # max.segments >= 2.
                                     },

                                     # B.4: The PeakSegFPOP function.
                                     # Reference: https://github.com/tdhock/PeakSegFPOP
                                     PeakSegFPOP = function(N){
                                     PeakSegOptimal::PeakSegFPOP(rpois(N, 1), rep(1, length(rpois(N, 1))), 3L)
                                     },

                                     # B.5: The PeakSegDisk::PeakSegFPOP_df function. (Fpop algo for writing data frame to disk)
                                     PeakSegDisk = function(N){
                                        PeakSegDisk::PeakSegFPOP_vec(rpois(N,10), 10)
                                     }
                                    ),
                      #-------------------------------------------------------------------------------------------------------------
                      # Complexity class C : Quadratic
                      quadratic = list(

                                     # C.1: The opart function
                                     opart = function(N){
                                        opart::opart_gaussian(rnorm(N), 1)
                                     },

                                     # C.2: The changepoint::cpt.mean function using SegNeigh method:
                                     # Default penalty won't work so am using asymptotic penalty (i.e. 0 < penalty <= 1)
                                     # It still explicity says "SegNeigh is computationally slow, use PELT instead".
                                     changepoint = function(N){
                                        changepoint::cpt.mean(rnorm(N), penalty = "Asymptotic", pen.value = 0.01, method = "SegNeigh")
                                     },

                                     # C.3: The cDPA function.
                                     PeakSegDP = function(N){
                                        PeakSegDP::cDPA(rpois(N, 1), rep(1, length(rpois(N, 1))), 3L)
                                     },

                                     # C.4: The PeakSegDisk::PeakSegFPOP_df function.
                                     PeakSegDisk = function(N){
                                        PeakSegDisk::PeakSegFPOP_vec(1:N, 10)
                                     }
                                    )
                     )
It’s currently composed of subsequent lists for the three major complexity classes we are concerned with, i.e. namely linear, log-linear and quadratic complexities.
Anirban
05/20/2020
R Package GSoC'20 Functions Test Cases Code Snippets