
R package developers traditionally rely on ad-hoc benchmarking (empirical timings and visual plots) to understand their code's asymptotic performance. We lack a framework for systematically testing the computational complexity of a function, which is crucial for identifying and implementing speed improvements in R code. testComplexity attempts to address this by providing a suite of functions for asymptotic complexity classification.
Since algorithms are used in every sphere of research, this package potentially caters to all variants of R-users. It has been specifically tested on ones ranging from changepoint detection and sorting to constrained optimal segmentation and partitioning, besides common base R functions such as substring and gregexpr.
Use devtools or remotes to fetch the package from the GitHub repository:
if(!require(devtools)) install.packages("devtools") devtools::install_github("Anirban166/testComplexity")
if(!require(remotes)) install.packages("remotes") remotes::install_github("Anirban166/testComplexity")
testComplexity @ returns @ type @ commit-branch(es)
├──> asymptoticTimings : data.frame timings quantifier master
│ ├──> asymptoticTimeComplexityClass : ├──> string ↑ complexity classifier master
│ └──> plotTimings : └──> ggplot object ↑ plotter master/Plotfunc
│
├──> asymptoticMemoryUsage : data.frame memory-usage quantifier Memtest
│ ├──> asymptoticMemoryComplexityClass : ├──> string ↑ complexity classifier Memtest
│ └──> plotMemoryUsage : └──> ggplot object ↑ plotter Memtest/Plotfunc
│
├──> asymptoticComplexityClass : string complexity classifier Generalizedcomplexity
│ └──> asymptoticComplexityClassifier : ↑ string ↑ complexity classifier Generalizedcomplexity
│
├──> expect_complexity_class : -/- test function Testfunc
│ └──> expect_time_complexity : -/- ↑ test function Testfunc
│ ├──> expect_linear_time : -/- ↑↑ test function Testfunc
│ ├──> expect_loglinear_time : -/- ↑↑ test function Testfunc
│ └──> expect_quadratic_time : -/- ↑↑ test function Testfunc
│
└──> testthat
├──> testsfortestComplexity unit-tester All branches
├──> testsforConstrainedchangepointmodelalgos unit-tester Testfunc
└──> testsforRegularfunctions unit-tester Testfunc
To get started, please check the general vignette which highlights all features, categorizes the different functions available, and describes their functionality through textual elucidations and a running example case.
For a quick overview of the main functionality (obtaining quantified benchmarks and subsequently computing the time/memory complexity class), please check the examples below.
N to asymptoticTimings()/asymptoticMemoryUsage(): library(data.table)
# Example 1 | Applying the bubble sort algorithm to a sample of 100 elements: (expected -> quadratic time & constant memory complexity)
bubble.sort <- function(elements.vec) {
n <- length(elements.vec)
for(i in 1:(n - 1)) {
for(j in 1:(n - i)) {
if(elements.vec[j + 1] < elements.vec[j]) {
temp <- elements.vec[j]
elements.vec[j] <- elements.vec[j + 1]
elements.vec[j + 1] <- temp
}
}
}
return(elements.vec)
}
df.bubble.time <- asymptoticTimings(bubble.sort(sample(1:100, N, replace = TRUE)), data.sizes = 10^seq(1, 3, by = 0.5))
data.table(df.bubble.time)
Timings Data sizes
1: 91902 10
2: 39402 10
3: 34701 10
4: 33101 10
5: 33201 10
---
496: 64490501 1000
497: 59799101 1000
498: 63452200 1000
499: 62807201 1000
500: 59757102 1000
df.bubble.memory <- asymptoticMemoryUsage(bubble.sort(sample(1:100, N, replace = TRUE)), data.sizes = 10^seq(1, 3, by = 0.1))
data.table(df.bubble.memory)
Memory usage Data sizes
1: 87800 10.00000
2: 2552 12.58925
3: 2552 15.84893
4: 2552 19.95262
5: 2552 25.11886
---
17: 7472 398.10717
18: 8720 501.18723
19: 10256 630.95734
20: 12224 794.32823
21: 14696 1000.00000# Example 2 | Testing PeakSegPDPA, an algorithm for constrained changepoint detection: (expected -> log-linear time and memory complexity)
data.vec <- rpois(N, 1)
df.PDPA.time <- asymptoticTimings(PeakSegOptimal::PeakSegPDPA(count.vec = data.vec, max.segments = 3L), data.sizes = 10^seq(1, 4, by = 0.1))
&data.table(df.PDPA.time)
Timings Data sizes
1: 248701 10
2: 120302 10
3: 125701 10
4: 133301 10
5: 146500 10
---
696: 405597501 10000
697: 408335001 10000
698: 338544401 10000
699: 404081901 10000
700: 399575501 10000
df.PDPA.memory <- asymptoticMemoryUsage(PeakSegOptimal::PeakSegPDPA(count.vec = data.vec, max.segments = 3L), data.sizes = 10^seq(1, 4, by = 0.1))
data.table(df.PDPA.memory)
Memory usage Data sizes
1: 6256 10.00000
2: 7024 12.58925
3: 7432 15.84893
4: 8560 19.95262
5: 9496 25.11886
---
25: 447792 2511.88643
26: 562336 3162.27766
27: 706512 3981.07171
28: 887792 5011.87234
29: 1116240 6309.57344asymptoticTimeComplexityClass()/asymptoticMemoryComplexityClass(): # Example 1 | Applying the bubble sort algorithm to a sample of 100 elements: (expected -> quadratic time & constant memory complexity) asymptoticTimeComplexityClass(df.bubble.time) [1] "quadratic" asymptoticMemoryComplexityClass(df.bubble.memory) [1] "constant"
# Example 2 | Testing PeakSegPDPA, an algorithm for constrained changepoint detection: (expected -> log-linear time and memory complexity) asymptoticTimeComplexityClass(df.PDPA.time) [1] "loglinear" asymptoticMemoryComplexityClass(df.PDPA.memory) [1] "loglinear"
# Example 3 | Testing the time complexity of the quick sort algorithm: (expected -> log-linear time complexity) asymptoticTimeComplexityClass(asymptoticTimings(sort(sample(1:100, N, replace = TRUE), method = "quick" , index.return = TRUE), data.sizes = 10^seq(1, 3, by = 0.5))) [1] "loglinear"
# Example 4 | Allocating a square matrix (N*N dimensions): (expected -> quadratic memory complexity) asymptoticMemoryComplexityClass(asymptoticMemoryUsage(matrix(data = N:N, nrow = N, ncol = N), data.sizes = 10^seq(1, 3, by = 0.1))) [1] "quadratic"
Check this screencast for a demonstration of time complexity testing on different sorting algorithms over a test session.
For obtaining a visual description of the trends followed between runtimes/memory-usage vs data sizes in order to visually diagnose/verify the complexity result(s), simple plots can be crafted. They are roughly grouped into:
plotTimings()/plotMemoryUsage() for time/memory cases respectively: # Timings plot for PeakSegDP::cDPA
df <- asymptoticTimings(PeakSegDP::cDPA(rpois(N, 1), rep(1, length(rpois(N, 1))), 3L), data.sizes = 10^seq(1, 4))
plotTimings(df.time, titles = list("Timings", "PeakSegDP::cDPA"), line.color = "#ffec1b", point.color = "#ffec1b", line.size = 1, point.size = 1.5)
# Equivalent ggplot object:
df <- asymptoticTimings(PeakSegDP::cDPA(rpois(data.sizes, 1), rep(1, length(rpois(data.sizes, 1))), 3L), data.sizes = 10^seq(1, 4))
ggplot(df, aes(x = `Data sizes`, y = Timings)) + geom_point(color = ft_cols$yellow, size = 1.5) + geom_line(color = ft_cols$yellow, size = 1) + labs(x = "Data sizes", y = "Runtime (in nanoseconds)") + scale_x_log10() + scale_y_log10() + ggtitle("Timings", "PeakSegDP::cDPA") + hrbrthemes::theme_ft_rc()# Memory Usage plot for PeakSegDP::cDPA
df <- asymptoticMemoryUsage(PeakSegDP::cDPA(rpois(N, 1), rep(1, length(rpois(N, 1))), 3L), data.sizes = 10^seq(1, 6, by = 0.1))
plotMemoryUsage(df.memory, titles = list("Memory Usage", "PeakSegDP::cDPA"), line.color = "#ffec1b", point.color = "#ffec1b", line.size = 1, point.size = 2)
# Equivalent ggplot object:
ggplot(df, aes(x = `Data sizes`, y = `Memory usage`)) + geom_point(color = ft_cols$yellow, size = 2) + geom_line(color = ft_cols$yellow, size = 1) labs(x = "Data sizes", y = "Memory usage (in bytes)") + scale_x_log10() + scale_y_log10() + ggtitle("Memory Usage", "PeakSegDP::cDPA") + hrbrthemes::theme_ft_rc()
rbind() and then plot the resultant data frame using suitable aesthetics, geometry, scale, labels/titles etc. via a ggplot: df.substring <- asymptoticTimings(substring(paste(rep("A", N), collapse = ""), 1:N, 1:N), data.sizes = 10^seq(1, 4, by = 0.5))
asymptoticTimeComplexityClass(df.substring)
[1] "linear"
df.PeakSegPDPA <- asymptoticTimings(PeakSegOptimal::PeakSegPDPA(rpois(N, 1),rep(1, length(rpois(N, 1))), 3L), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 1)
asymptoticTimeComplexityClass(df.PeakSegPDPA)
[1] "loglinear"
df.cDPA <- asymptoticTimings(PeakSegDP::cDPA(rpois(N, 1), rep(1, length(rpois(N, 1))), 3L), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 5)
asymptoticTimeComplexityClass(df.cDPA)
[1] "quadratic"
df.gregexpr <- asymptoticTimings(gregexpr("a", paste(collapse = "", rep("ab", N)), perl = TRUE), data.sizes = 10^seq(1, 4, by = 0.5))
asymptoticTimeComplexityClass(df.gregexpr)
[1] "linear"
df.fpop <- asymptoticTimings(fpop::Fpop(rnorm(N), 1), data.sizes = 10^seq(1, 4, by = 0.5))
asymptoticTimeComplexityClass(df.fpop)
[1] "loglinear"
df.opart <- asymptoticTimings(opart::opart_gaussian(rnorm(N), 1), data.sizes = 10^seq(1, 4, by = 0.5))
asymptoticTimeComplexityClass(df.opart)
[1] "quadratic"
df.substring$expr = "substring"
df.PeakSegPDPA$expr = "PeakSegPDPA"
df.cDPA$expr = "cDPA"
df.gregexpr$expr = "gregexpr"
df.fpop$expr = "fpop"
df.opart$expr = "opart"
plot.df <- rbind(df.substring, df.PeakSegPDPA, df.cDPA, df.gregexpr, df.fpop, df.opart)
ggplot(plot.df, aes(x = `Data sizes`, y = Timings)) + geom_point(aes(color = expr)) + geom_line(aes(color = expr)) + labs(x = "Data sizes", y = "Runtime (in nanoseconds)") + scale_x_log10() + scale_y_log10() + ggtitle("Timings comparison plot", subtitle = "Linear vs Log-linear vs Quadratic complexities") + ggthemes::theme_pander()
Including more functions (if applicable) and increasing the number of data sizes can lead to a more comprehensive outlook:

ggfortify (an extension of ggplot2) can be used to produce diagnostic plots for generalized linear models with the same formulae as used in the complexity classification functions: library(ggfortify) df <- asymptoticTimings(PeakSegDP::cDPA(rpois(N, 1), rep(1, length(rpois(N, 1))), 3L), data.sizes = 10^seq(1, 4 by = 0.1)) glm.plot.obj <- glm(Timings~`Data sizes`, data = df) ggplot2::autoplot(stats::glm(glm.plot.obj)) + ggthemes::theme_gdocs()

microbenchmark::microbenchmark() is used to obtain the time results in testComplexity::asymptoticTimings() for precision and convenience of having the benchmarks as a data frame. bench::bench_memory() is used to compute the size of memory allocation in testComplexity::asymptoticMemoryUsage(). covr::package_coverage() both locally and via codecov, with 100% coverage attained. bench::bench_memory() overcomes the Windows-only OS limitation for memory complexity testing observed in GuessCompx::CompEst() as it successfully runs on other operating systems.