Mutation testing for data.table

In today’s world, rigorous software testing is essential to ensure that code behaves correctly across all scenarios. While a prominent R package such as data.table has traditional unit tests to check if a function behaves as expected, it is an iterative process of trying to factor in different corner cases that haven’t been though of yet. Beyond the gathered insights and logical reasoning of developers and contributors, some unanticipated cases can be founded through automated testing procedures. One effective method to go about this involves introducing small changes (mutants) in the code - This is known as mutation testing, and it helps identify weaknesses in the test suite or cases where tests fail to catch errors. This blog post documents my experience and results of testing such mutants that were found from mutation testing of the C code in data.table (or files in the /src directory). Issue#6114@Rdatatable/data.table enlists the discovered mutants, out of which I focused on some of the key routines of the package:

fastmean
rbindlist
fifelse
subset
coalesce
forder

Function-specific sections aside, here are a few other hyperlinks:

PR example
Conclusions
Testing process

fastmean

The mutant here (line for reference) involves an arithmetic operator change for the sum and adjusted mean s:

s += t/n; // original
s += t*n; // mutant

s as far as I understand is a high precision (long double) variable, while n is the number of non-NA elements in the input vector (x in fastmean.c), and t seems to be a correction term that adjusts the mean to account for any floating-point rounding errors that occurred during the initial sum accumulation. If anything, I expected this mutant to break for cases involving higher precision, so I tried to create test cases that involve some stress testing in terms of numbers:

library(data.table)
library(testthat)

testFastMean <- function(x, na.rm)
{
  dt <- as.data.table(x)
  options(datatable.optimize=1)
  result <- dt[, mean(x, na.rm = na.rm)] # Updated this as per Michael's suggestion.
  expected <- base::mean(x, na.rm = na.rm)
  expect_equal(result, expected, tolerance = .Machine$double.eps^0.5)
}
test_that("fast mean tests",
{
  # Edge case for precision testing with large numbers:
  x <- c(1e10, 1e10 + 2, 1e10 + 4)
  testFastMean(x, TRUE)
  # Edge case for precision testing with small differences:
  x <- c(1e-10, 1e-10 + 1e-12, 1e-10 + 2e-12, 1e-10 + 3e-12, 1e-10 + 4e-12)
  testFastMean(x, TRUE)
  # Mixed values:
  x <- c(1e10, 1e-10, 1e12, 1e-12)
  testFastMean(x, TRUE)
  # Large vector with a small increment + testing accumulated precision:
  x <- c(rep(1e10, 1e6), rep(1e-10, 1e6))
  testFastMean(x, TRUE)
})

O/P:

Test passed 😀

I then tested base R’s mean vs data.table’s fast mean to see any discrepancies. I got a few breaking results for that:

library(data.table)
options(datatable.optimize=1)

meanComparison <- function(x, ...) 
{
  baseR <- mean(x, ...)
  fastmean <- .External("Cfastmean", x, ...)
  cat("Results as computed by:\nBase R's mean:", baseR, "\ndata.table's fast mean:", fastmean, "\n")
  fifelse(identical(baseR, fastmean), "Passed", "Failed")
}

testInputs <- list(
  c(rep(1e308, 1e3), rep(-1e308, 1e3)),
  c(rnorm(1e6, mean = 0, sd = 1e5), rep(1, 1e6), .Machine$double.xmax)
)

for(i in seq_along(testInputs))
{
  cat("Test case ", i, ":\n", sep = "")
  cat(meanComparison(testInputs[[i]], na.rm = TRUE), "\n")
}

O/P:

Test case 1:
Results as computed by:
Base R's mean: -8.147083e+291 
data.table's fast mean: Inf 
Failed 
Test case 2:
Results as computed by:
Base R's mean: 8.988461e+301 
data.table's fast mean: -1.508633e+304 
Failed 

Nothing for the mutant though, or for both that and the original code I kept getting the same results. Here are a few more tests I tried:

meanComparison <- function(x, na.rm) 
{
    baseR <- mean(x, na.rm = TRUE)
    dt <- data.table(x)
    options(datatable.optimize = 1)
    fastmean <- dt[, mean(x, na.rm = TRUE)]
    cat("Results as computed by:\nBase R's mean:", baseR, "\ndata.table's fast mean:", fastmean, "\n")
    fifelse(identical(baseR, fastmean), "Passed", "Failed")
}

testInputs <- list(
    c(.Machine$double.xmax, -.Machine$double.xmax, 0),
    c(1e10, -1e10, 1e-10, -1e-10, 0),
    c(Inf, -Inf, NaN), 
    c(rep(1, 1e6), rep(-1, 1e6)),
    rnorm(1e6, mean = 0, sd = 1e5),
    runif(1e6, min = -1e5, max = 1e5),
    c(rnorm(1e6, mean = 0, sd = 1e5), rep(1, 1e6), .Machine$double.xmax),
    c(rep(.Machine$double.eps, 1e6), rep(-.Machine$double.eps, 1e6)),
    c(rep(.Machine$double.xmin, 1e6), rep(-.Machine$double.xmin, 1e6)),
    c(rep(1e308, 1e3), rep(-1e308, 1e3)),
    c(1 + 7i, 11 + 8i, 5 + 21i)
)

for(i in seq_along(testInputs))
{
    cat("Test case ", i, ":\n", sep = "")
    cat(meanComparison(testInputs[[i]], na.rm = TRUE), "\n")
}

O/P:

Test case 1:
Results as computed by:
Base R's mean: 0 
data.table's fast mean: 0 
Passed 
Test case 2:
Results as computed by:
Base R's mean: 0 
data.table's fast mean: 0 
Passed 
Test case 3:
Results as computed by:
Base R's mean: NaN 
data.table's fast mean: NaN 
Passed 
Test case 4:
Results as computed by:
Base R's mean: 0 
data.table's fast mean: 0 
Passed 
Test case 5:
Results as computed by:
Base R's mean: 35.47985 
data.table's fast mean: 35.47985 
Passed 
Test case 6:
Results as computed by:
Base R's mean: -0.1811827 
data.table's fast mean: -0.1811827 
Passed 
Test case 7:
Results as computed by:
Base R's mean: 8.988461e+301 
data.table's fast mean: 8.988461e+301 
Passed 
Test case 8:
Results as computed by:
Base R's mean: 0 
data.table's fast mean: 0 
Passed 
Test case 9:
Results as computed by:
Base R's mean: 0 
data.table's fast mean: 0 
Passed 
Test case 10:
Results as computed by:
Base R's mean: -8.147083e+291 
data.table's fast mean: -8.147083e+291 
Passed 
Test case 11:
Results as computed by:
Base R's mean: 5.66667+12i 
data.table's fast mean: 5.66667+12i 
Passed 

I also created my own version of fastmean.c with a more condensed setup (as far as I understood) based on the execution path:

SEXP fastmean(SEXP args)
{
  double *x;
  R_len_t n;
  double sum = 0.0;
  
  // Extracting the numeric vector:
  if(!isReal(args))
  {
    error("Input must be numeric");
  }
  x = REAL(args);
  n = length(args);
  
  // Computing the sum:
  for(R_len_t i = 0; i < n; ++i)
  {
    sum += x[i];
  }

  sum *= n; // This line is the change that I'm testing!
  // Computing the mean and returning it as a numeric vector:
  double mean = sum / n;
  SEXP result = PROTECT(allocVector(REALSXP, 1));
  REAL(result)[0] = mean;
  UNPROTECT(1);
  return result;
}

Still no breaking/conflicting changes. At the end I gave up on this particular mutant thus, and also since this did not make any difference to the mean being used for data.table operations (which is the gforce version used for grouping when datatable.optimize is 1, as per Michael’s comments). That aside, I didn’t get a clear idea on going about testing this further given what I already tried.

Out of curiosity, I also tried to check the difference between the `Cfastmean` routine and the use of fast mean via `mean` inside `data.table` with `datatable.optimize=1`.

```r printStack <- function() { cat("Call stack:\n") for(i in 1:sys.nframe()) { if(exists("sys.calls", frame = i)) { call <- sys.calls()[[i]] cat(deparse(call), "\n") } } } DT <- data.table(x = c(10, 3, NA, 5)) # Approach A: (Using optimize=1 and calling mean inside a data.table scope) options(datatable.optimize = 1) result.dt <- DT[, mean(x, na.rm = TRUE)] printStack() # Approach B: (Directly calling Cfastmean via .External) result.Cfastmean <- .External("Cfastmean", DT$x, na.rm = TRUE) printStack() # if("Cfastmean" %in% ls(envir = baseenv()) && length(body(mean)) >= 2 && identical(body(mean)[[2]], quote(.External("Cfastmean", ...)))) { print("mean uses Cfastmean") } ```

rbindlist

The mutant I tested for here is a relational operator change inside a conditional:

if (nrow==0 && ncol==0) return(R_NilValue); # Original
if (nrow>=0 && ncol==0) return(R_NilValue); # Mutated

This is the test I created for it:

library(data.table)

iVal <- 1:10
test_rbindlist <- function()
{
  inputList <- list(
    data.table(A = integer(0)),
    data.table(A = integer(0), B = 1:3),
    data.table(A = integer(0), B = integer(0)),
    data.table(A = 1, B = fifelse(iVal %% 2 == 0, 2, NA_real_)
  )
  usenamesArg <- FALSE
  fillArg <- TRUE
  idcolArg <- NULL
  result <- rbindlist(inputList, usenamesArg, fillArg, idcolArg)
  print("Test result:")
  print(result)
  stopifnot(!is.null(result))
}
test_rbindlist()

O/P:

[1] "Test result:"
        A     B
    <num> <num>
    1    NA
    1     2
    1    NA
    1     2
    1    NA
    1     2
    1    NA
    1     2
    1    NA
    1     2

The addition of nrow > 0 doesn’t affect this (also not that I’m using fillArg is TRUE) since having zero columns itself invalidates the use of rbindlist and returns NULL. Thus, I believe this doesn’t warrant any tests.

fifelse

Similar to the one above, the mutant is a slight modification to the input validation check, from != to < between the length of the test vector and the NA vector: (len0 and len3 below respectively)

if (!na_n && len3!=1 && len3!=len0)
if (!na_n && len3!=1 && len3<len0)

At first, I created my own version taking inspiration from fifelse.c:

library(data.table)

fifelseR <- function(test, yes, no, na)
{
  .Call("CfifelseR", test, yes, no, na)
}

# Mock fifelse function to simulate the altered conditional:
m_fifelse <- function(test, yes, no, na)
{
  if(!is.logical(test))
      stop("'test' must be logical")
  len0 <- length(test)
  len3 <- length(na)
  if(len3 != 1 && len3 < len0)
    stop(sprintf("Length of 'na' is %d but must be 1 or length of 'test' (%d)", len3, len0))
  fifelseR(test, yes, no, na) # Calling the fifelseR function.
}

testfifelse <- function()
{
  test <- c(TRUE, FALSE, TRUE)
  yes <- c(1, 2, 3)
  no <- c(4, 5, 6)
  na_values <- as.numeric(c(NA, NA, NA, NA))
  
  tryCatch({
    m_fifelse(test, yes, no, na_values)
    cat("Original condition passed.\n")
    na_values[1] <- 1 
    m_fifelse(test, yes, no, na_values)
    cat("Test passed.\n")
  }, error = function(e)
  {
    cat("Error:", e$message, "\n")
    cat("Test failed.\n")
  })
}
testfifelse()

O/P:

Error: Length of 'na' is 4 but must be 1 or length of 'test' (3) 
Test failed.

This correctly fails for the mutant whereas having the original operator in the condition (!=) does not induce test failure.

But confusingly so, that is not the case for the actual function:

library(data.table)

# Create a test case where len3 (length of 'na') equals len0 (length of 'test')
testfifelse <- function()
{
  result <- tryCatch({
    fifelse(c(TRUE, FALSE, TRUE), 1:3, 4:6, as.numeric(c(NA, NA, NA)))
    message("Original condition passed.")
  }, error = function(e)
  {
    message("Original condition failed: ", e$message)
  })
  result <- tryCatch({
    fifelse(c(TRUE, FALSE, TRUE), 1:3, 4:6, NA)
    message("Altered condition passed.")
  }, error = function(e)
  {
    message("Altered condition failed: ", e$message)
  })
}
testfifelse()

O/P:

Original condition passed.
Altered condition passed.

I presume it’s logically handled without any obscurity if argument na is greater than test in length. But either way, since != is a stronger check anyway, this too does not need a test/PR for this mutant.

subset

The mutant I tested for here is a relational operator change inside a conditional as well:

if (elem<1 || elem>max) continue;
if (elem<1 || elem==max) continue;

I tried inputs assuming elem to represent an index or value being checked (more specifically, it is used to access elements from the source array in subset.c) and max being a boundary condition (upper bound of allowable indices) but it didn’t work, so I’m not sure what triggers that case.

Here is a test in general which broke for both the original code and the mutant:

library(data.table)

dt <- data.table(a = 1:5)
index <- seq_len(max(dt$a))
expectedResult <- dt[seq(1, 4)]
result <- tryCatch({
    dt[index]
}, error = function(e) {
    NULL
})

if(!identical(result, expectedResult))
  stop("Test failed.")
else
  message("Test passed.")

O/P:

Error: Test failed.

The test case above fails because the current implementation of the indexing behavior is not correctly handling the situation where the index vector contains the maximum value present in the a column of dt.

dt <- data.table(a = 1:5)
index <- c(-1, NA_integer_)

tryCatch({
  result <- dt[index]
  stop("Failed.")
}, error = function(e) {
  message("Passed.")
})

This could potentially do with tests, but more so involves changing the original logic first if it’s an unintended design flaw.

coalesce

The mutant in this case is that of a negligible switch-case logic where we break out at the start of its scope (line for reference):

switch(TYPEOF(first)) { # Original
switch(TYPEOF(first)) { # Mutant
break;

I wasn’t able to create any breaking test cases for either version, but then I think the switch doesn’t run or execution doesn’t reach that point. After some testing with different input types, I also observed that the R function fcoalesce was failing for raw inputs:

test_that("Single element raw vector test.",
{
    x <- list(as.raw(c(NA, 2, NA)), as.raw(1))
    result <- data.table:::fcoalesce(x)
    expect_equal(result, as.raw(c(1, 2, 1)))
})

O/P:

── Warning: coalesce works correctly for raw vectors ───────────────────────────
out-of-range values treated as 0 in coercion to raw

── Error: coalesce works correctly for raw vectors ─────────────────────────────
Error in `data.table:::fcoalesce(x)`: Type 'raw' is not supported
Backtrace:
    ▆
 1. └─data.table:::fcoalesce(x)

Error:
! Test failed
Backtrace:
    ▆
 1. ├─testthat::test_that(...)
 2. │ └─withr (local) `<fn>`()
 3. └─reporter$stop_if_needed()
 4.   └─rlang::abort("Test failed", call = NULL)

forder

This mutant was that of excluding code or more specifically commenting line 167 of the original code. No test case I made broke the function, a fair amount of tests already exists (for e.g. as revealed by grep -r "forder" inst/tests/tests.Rraw), and the change is also a slightly tricky one to construct test cases around.

PR example

Toby created a pull request (#6115) to data.table to introduce a test for fread based on this particular case from the mutation testing results:

if (length(key) == 1L) { # Original
if (length(key) < 1L)  { # Mutant

In a nutshell, the relation operator replacement above seems to trigger the condition even with empty keys (length(key) being less than 1L means 0 would work), which can lead to potentially incorrect behaviour by executing code that assumes there is a single key (but there is none).

Thus, a simple test case for fread with an empty character vector (character()) was introduced to the test suite of data.table to handle this corner case:

test(1958.041, fread('a,b\n1,2', key = character()),    data.table(a = 1L, b = 2L), warning="cols is a character vector of zero length")

One can test it by simply running data.table::fread('a,b\n1,2', key = character()) locally with the current version of data.table installed and loaded.

Conclusions

At the end, the main goal here would be to create more of such test cases and introduce them to the existing array of tests.

Addition of such tests for every explored mutant would be great, although there are a few important things to note:

Some mutant-based changes are not trivial, meaning it’s hard to construct tests with input values that trigger them.
Then there are cases which makes no sense, such as the rbindlist mutant I discussed above (it is not possible to have a data.table with ncol=0 and nrow>0).
A few of them don’t even run in any situation (lie beyond code coverage) or in other words, their execution path is never reached.
Sometimes the mutant is not bound to be helpful, for e.g. changing from a stronger filter (or not narrowing down) such as != to >/</>=/<= (like for the fifelse mutant I documented above) is a clear false positive.

These are (as I think) the roadblocks or reasoning behind not creating a test/PR for each of the enlisted cases.
For the fread mutant that Toby submitted a PR for and I discussed in the ‘PR example’ section above, the variable key is a direct argument, and the condition based off it was clear to make test cases out of. Such cases are viable candidates for mutants that can have tests created based on handling them. For the six mutants that I explored and documented above though, none of them resulted in a PR :(

To maximize the likelihood of picking mutants that could result in the creation of tests/PRs, I would recommend picking cases which:

Directly work on the input arguments of a function, or at least without much indirection. (Easier to pick inputs)
Make sense as a whole to even be an applicable case. (Probably the first thing to check!)
Can be tested with the corresponding R function for it directly. (No need to call C routines)

These are some points to keep in mind that might help one save time in pursuing this - or at least I think I should have done better and considered such aspects more (at least one test/PR, right?).

If I were to work on this again (which I do want to!) in the near future, I will also be focusing on creating inputs after checking if the cases lie within the execution path, and use test() directly. If possible though, I would appreciate a bit of help in going about the testing process as a whole, aside from insights on how to tackle such cases. As advised, I would try to change my approach accordingly to find inputs that break assuming the mutant cases in code. I think it’s worth investigating the other mutants I haven’t as well (some of them may be interesting and bear fruit!), so I’m looking forward to test more.

For my current testing methodology or the process I went through to get the results above, please continue reading below.

Testing process

R functions

For the C files that have a corresponding function in R interfacing directly (without needing to call the underlying C routine), I simply wrote R code that implements functions with tryCatch blocks, or used testthat and its functions as wrappers.

Making changes to the files and running C code

Initially, I tried to directly compile the C code with gcc before and after making changes, but it complained of missing headers (such as R.h) and so I resorted to using R CMD INSTALL to generate the shared object which I then loaded onto the R session via dyn.load. For functions that do not have a function to call them directly in R code, I combine that with a wrapper (using .Call) to call the corresponding C routine from within R (given that there mostly aren’t exported objects in the data.table namespace for some those functions, e.g. fast mean). I then switched to using .External to call C routines via symbol names.

Installation checks

Here’s a script I wrote to double-check that I’m using the correct data.table installation/setup/version for testing changes to my C code:

# Function to remove data.table package
removeDT <- function(libPath) 
{
  installedPackages <- installed.packages(lib.loc = libPath)
  if("data.table" %in% rownames(installedPackages)) 
  {
    remove.packages("data.table", lib = libPath)
    message(paste("Removed data.table from", libPath))
  } 
  else 
  {
    message(paste("data.table not found in", libPath))
  }
}

# Unloading from current session:
if("package:data.table" %in% search()) 
{
  detach("package:data.table", unload = TRUE)
  message("Unloaded data.table from the session")
}

libraryPaths <- .libPaths()
lapply(libraryPaths, removeDT)

# Checking for any remaining directories and deleting them:
for(libPath in libraryPaths) 
{
  data_table_dir <- file.path(libPath, "data.table")
  if(dir.exists(data_table_dir)) 
  {
    unlink(data_table_dir, recursive = TRUE)
    message(paste("Deleted directory:", data_table_dir))
  }
}

# Verify if data.table has been completely removed
installed_packages <- lapply(libraryPaths, installed.packages)
if(!any(sapply(installed_packages, function(pkg) "data.table" %in% rownames(pkg)))) 
{
  message("data.table has been successfully removed from all library paths.")
} else 
{
  message("data.table is still installed in some library paths.")
}

# Taking care of .Rprofile files that can load DT:
RprofileFiles <- c("~/.Rprofile", file.path(Sys.getenv("R_HOME"), "etc", "Rprofile.site"))
for(Rprofile in RprofileFiles) 
{
  if(file.exists(Rprofile)) 
  {
    RprofileContent <- readLines(Rprofile)
    if(any(grepl("data.table", RprofileContent))) 
    {
      message(paste("data.table reference found in:", Rprofile))
    } else 
    {
      message(paste("No data.table reference in:", Rprofile))
    }
  }
}

# Verifying that data.table is installed in the current directory: (after removing from .libPaths() directory via remove.packages without lib spec)
dt.path <- try(find.package("data.table", lib.loc = getwd()), silent = TRUE)
if(inherits(data.tablePath, "try-error"))
{
    cat("data.table is not installed in the current directory.\n")
    devtools::install(".")
}
else
{
    cat("data.table is installed in the current directory at:", dt.path, "\n")
}