
R package developers traditionally rely on ad-hoc benchmarking (empirical timings and visual plots) to understand their code's asymptotic performance. We lack a framework for systematically testing the computational complexity of a function, which is crucial for identifying and implementing speed improvements in R code. testComplexity attempts to address this by providing a suite of functions for asymptotic complexity classification.
Since algorithms are used in every sphere of research, this package potentially caters to all variants of R-users. It has been specifically tested on ones ranging from changepoint detection and sorting to constrained optimal segmentation and partitioning, besides common base R functions such as substring and gregexpr.
Use devtools or remotes to fetch the package from the GitHub repository:
if(!require(devtools)) install.packages("devtools") devtools::install_github("Anirban166/testComplexity")
if(!require(remotes)) install.packages("remotes") remotes::install_github("Anirban166/testComplexity")
testComplexity                              @ returns              @ type                    @ commit-branch(es)
├──> asymptoticTimings                    : data.frame             timings quantifier        master
│    ├──> asymptoticTimeComplexityClass   :   ├──> string          ↑ complexity classifier   master
│    └──> plotTimings                     :   └──> ggplot object   ↑ plotter                 master/Plotfunc
│
├──> asymptoticMemoryUsage                : data.frame             memory-usage quantifier   Memtest
│    ├──> asymptoticMemoryComplexityClass :   ├──> string          ↑ complexity classifier   Memtest
│    └──> plotMemoryUsage                 :   └──> ggplot object   ↑ plotter                 Memtest/Plotfunc
│
├──> asymptoticComplexityClass            : string                 complexity classifier     Generalizedcomplexity
│    └──> asymptoticComplexityClassifier  :   ↑ string             ↑ complexity classifier   Generalizedcomplexity
│
├──> expect_complexity_class              : -/-                    test function             Testfunc
│    └──> expect_time_complexity          : -/-                    ↑  test function          Testfunc
│         ├──> expect_linear_time         : -/-                    ↑↑ test function          Testfunc
│         ├──> expect_loglinear_time      : -/-                    ↑↑ test function          Testfunc
│         └──> expect_quadratic_time      : -/-                    ↑↑ test function          Testfunc
│
└──> testthat
     ├──> testsfortestComplexity                                   unit-tester               All branches
     ├──> testsforConstrainedchangepointmodelalgos                 unit-tester               Testfunc
     └──> testsforRegularfunctions                                 unit-tester               Testfunc
To get started, please check the general vignette which highlights all features, categorizes the different functions available, and describes their functionality through textual elucidations and a running example case.
For a quick overview of the main functionality (obtaining quantified benchmarks and subsequently computing the time/memory complexity class), please check the examples below.
N to asymptoticTimings()/asymptoticMemoryUsage(): library(data.table)
# Example 1 | Applying the bubble sort algorithm to a sample of 100 elements: (expected -> quadratic time & constant memory complexity)
bubble.sort <- function(elements.vec) { 
  n <- length(elements.vec)
  for(i in 1:(n - 1)) {
    for(j in 1:(n - i)) {
      if(elements.vec[j + 1] < elements.vec[j]) { 
        temp <- elements.vec[j]
        elements.vec[j] <- elements.vec[j + 1]
        elements.vec[j + 1] <- temp
      }
    }
  }
  return(elements.vec)
}
df.bubble.time <- asymptoticTimings(bubble.sort(sample(1:100, N, replace = TRUE)), data.sizes = 10^seq(1, 3, by = 0.5))
data.table(df.bubble.time)
      Timings Data sizes
  1:    91902         10
  2:    39402         10
  3:    34701         10
  4:    33101         10
  5:    33201         10
 ---
496: 64490501       1000
497: 59799101       1000
498: 63452200       1000
499: 62807201       1000
500: 59757102       1000
df.bubble.memory <- asymptoticMemoryUsage(bubble.sort(sample(1:100, N, replace = TRUE)), data.sizes = 10^seq(1, 3, by = 0.1))
data.table(df.bubble.memory)
    Memory usage Data sizes
 1:        87800   10.00000
 2:         2552   12.58925
 3:         2552   15.84893
 4:         2552   19.95262
 5:         2552   25.11886
 ---
17:         7472  398.10717
18:         8720  501.18723
19:        10256  630.95734
20:        12224  794.32823
21:        14696 1000.00000# Example 2 | Testing PeakSegPDPA, an algorithm for constrained changepoint detection: (expected -> log-linear time and memory complexity)
data.vec <- rpois(N, 1)
df.PDPA.time <- asymptoticTimings(PeakSegOptimal::PeakSegPDPA(count.vec = data.vec, max.segments = 3L), data.sizes = 10^seq(1, 4, by = 0.1))
&data.table(df.PDPA.time)
       Timings Data sizes
  1:    248701         10
  2:    120302         10
  3:    125701         10
  4:    133301         10
  5:    146500         10
 ---
696: 405597501      10000
697: 408335001      10000
698: 338544401      10000
699: 404081901      10000
700: 399575501      10000
df.PDPA.memory <- asymptoticMemoryUsage(PeakSegOptimal::PeakSegPDPA(count.vec = data.vec, max.segments = 3L), data.sizes = 10^seq(1, 4, by = 0.1))
data.table(df.PDPA.memory)
    Memory usage Data sizes
 1:         6256   10.00000
 2:         7024   12.58925
 3:         7432   15.84893
 4:         8560   19.95262
 5:         9496   25.11886
 ---
25:       447792 2511.88643
26:       562336 3162.27766
27:       706512 3981.07171
28:       887792 5011.87234
29:      1116240 6309.57344asymptoticTimeComplexityClass()/asymptoticMemoryComplexityClass(): # Example 1 | Applying the bubble sort algorithm to a sample of 100 elements: (expected -> quadratic time & constant memory complexity) asymptoticTimeComplexityClass(df.bubble.time) [1] "quadratic" asymptoticMemoryComplexityClass(df.bubble.memory) [1] "constant"
# Example 2 | Testing PeakSegPDPA, an algorithm for constrained changepoint detection: (expected -> log-linear time and memory complexity) asymptoticTimeComplexityClass(df.PDPA.time) [1] "loglinear" asymptoticMemoryComplexityClass(df.PDPA.memory) [1] "loglinear"
# Example 3 | Testing the time complexity of the quick sort algorithm: (expected -> log-linear time complexity) asymptoticTimeComplexityClass(asymptoticTimings(sort(sample(1:100, N, replace = TRUE), method = "quick" , index.return = TRUE), data.sizes = 10^seq(1, 3, by = 0.5))) [1] "loglinear"
# Example 4 | Allocating a square matrix (N*N dimensions): (expected -> quadratic memory complexity) asymptoticMemoryComplexityClass(asymptoticMemoryUsage(matrix(data = N:N, nrow = N, ncol = N), data.sizes = 10^seq(1, 3, by = 0.1))) [1] "quadratic"
Check this screencast for a demonstration of time complexity testing on different sorting algorithms over a test session.
For obtaining a visual description of the trends followed between runtimes/memory-usage vs data sizes in order to visually diagnose/verify the complexity result(s), simple plots can be crafted. They are roughly grouped into:
plotTimings()/plotMemoryUsage() for time/memory cases respectively: # Timings plot for PeakSegDP::cDPA
df <- asymptoticTimings(PeakSegDP::cDPA(rpois(N, 1), rep(1, length(rpois(N, 1))), 3L), data.sizes = 10^seq(1, 4))
plotTimings(df.time, titles = list("Timings", "PeakSegDP::cDPA"), line.color = "#ffec1b", point.color = "#ffec1b", line.size = 1, point.size = 1.5)
# Equivalent ggplot object:
df <- asymptoticTimings(PeakSegDP::cDPA(rpois(data.sizes, 1), rep(1, length(rpois(data.sizes, 1))), 3L), data.sizes = 10^seq(1, 4))
ggplot(df, aes(x = `Data sizes`, y = Timings)) + geom_point(color = ft_cols$yellow, size = 1.5) + geom_line(color = ft_cols$yellow, size = 1) + labs(x = "Data sizes", y = "Runtime (in nanoseconds)") + scale_x_log10() + scale_y_log10() + ggtitle("Timings", "PeakSegDP::cDPA") + hrbrthemes::theme_ft_rc()# Memory Usage plot for PeakSegDP::cDPA
df <- asymptoticMemoryUsage(PeakSegDP::cDPA(rpois(N, 1), rep(1, length(rpois(N, 1))), 3L), data.sizes = 10^seq(1, 6, by = 0.1))
plotMemoryUsage(df.memory, titles = list("Memory Usage", "PeakSegDP::cDPA"), line.color = "#ffec1b", point.color = "#ffec1b", line.size = 1, point.size = 2)
# Equivalent ggplot object:
ggplot(df, aes(x = `Data sizes`, y = `Memory usage`)) + geom_point(color = ft_cols$yellow, size = 2) + geom_line(color = ft_cols$yellow, size = 1) labs(x = "Data sizes", y = "Memory usage (in bytes)") + scale_x_log10() + scale_y_log10() + ggtitle("Memory Usage", "PeakSegDP::cDPA") + hrbrthemes::theme_ft_rc()
rbind() and then plot the resultant data frame using suitable aesthetics, geometry, scale, labels/titles etc. via a ggplot: df.substring <- asymptoticTimings(substring(paste(rep("A", N), collapse = ""), 1:N, 1:N), data.sizes = 10^seq(1, 4, by = 0.5))
asymptoticTimeComplexityClass(df.substring)
[1] "linear"
df.PeakSegPDPA <- asymptoticTimings(PeakSegOptimal::PeakSegPDPA(rpois(N, 1),rep(1, length(rpois(N, 1))), 3L), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 1)
asymptoticTimeComplexityClass(df.PeakSegPDPA)
[1] "loglinear"
df.cDPA <- asymptoticTimings(PeakSegDP::cDPA(rpois(N, 1), rep(1, length(rpois(N, 1))), 3L), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 5)
asymptoticTimeComplexityClass(df.cDPA)
[1] "quadratic"
df.gregexpr <- asymptoticTimings(gregexpr("a", paste(collapse = "", rep("ab", N)), perl = TRUE), data.sizes = 10^seq(1, 4, by = 0.5))
asymptoticTimeComplexityClass(df.gregexpr)
[1] "linear"
df.fpop <- asymptoticTimings(fpop::Fpop(rnorm(N), 1), data.sizes = 10^seq(1, 4, by = 0.5))
asymptoticTimeComplexityClass(df.fpop)
[1] "loglinear"
df.opart <- asymptoticTimings(opart::opart_gaussian(rnorm(N), 1), data.sizes = 10^seq(1, 4, by = 0.5))
asymptoticTimeComplexityClass(df.opart)
[1] "quadratic"
df.substring$expr = "substring"
df.PeakSegPDPA$expr = "PeakSegPDPA"
df.cDPA$expr = "cDPA"
df.gregexpr$expr = "gregexpr"
df.fpop$expr = "fpop"
df.opart$expr = "opart"
plot.df <- rbind(df.substring, df.PeakSegPDPA, df.cDPA, df.gregexpr, df.fpop, df.opart)
ggplot(plot.df, aes(x = `Data sizes`, y = Timings)) + geom_point(aes(color = expr)) + geom_line(aes(color = expr)) + labs(x = "Data sizes", y = "Runtime (in nanoseconds)") + scale_x_log10() + scale_y_log10() + ggtitle("Timings comparison plot", subtitle = "Linear vs Log-linear vs Quadratic complexities") + ggthemes::theme_pander()
Including more functions (if applicable) and increasing the number of data sizes can lead to a more comprehensive outlook: 

ggfortify (an extension of ggplot2) can be used to produce diagnostic plots for generalized linear models with the same formulae as used in the complexity classification functions: library(ggfortify) df <- asymptoticTimings(PeakSegDP::cDPA(rpois(N, 1), rep(1, length(rpois(N, 1))), 3L), data.sizes = 10^seq(1, 4 by = 0.1)) glm.plot.obj <- glm(Timings~`Data sizes`, data = df) ggplot2::autoplot(stats::glm(glm.plot.obj)) + ggthemes::theme_gdocs()

microbenchmark::microbenchmark() is used to obtain the time results in testComplexity::asymptoticTimings() for precision and convenience of having the benchmarks as a data frame. bench::bench_memory() is used to compute the size of memory allocation in testComplexity::asymptoticMemoryUsage(). covr::package_coverage() both locally and via codecov, with 100% coverage attained. bench::bench_memory() overcomes the Windows-only OS limitation for memory complexity testing observed in GuessCompx::CompEst() as it successfully runs on other operating systems.