- July 16, ‘24:
1) Started testing my GHA on non data.table
packages - Rcpp using ones, starting with dplyr. Testing a performance regression with dplyr::summarise
ft. many groups: (6 hours)
atime::atime_test(
N = 10^seq(1, 8),
setup = {
data <- tibble(id1 = 1:100) %>%
crossing(id2 = 1:100) %>%
crossing(obs = 1:2) %>%
mutate(value = runif(n()))
},
expr = data %>% group_by(id1, id2) %>% dplyr:::summarise(across(value, mean, na.rm = T), .groups = "drop"),
Before = "7a968663bdff9f02bf2b410f3a92ed0fbd576dba", # v1.0.5 https://github.com/tidyverse/dplyr/commit/7a968663bdff9f02bf2b410f3a92ed0fbd576dba
Regression = "22def186ab018417574d458dd987b2cf0bf66332") # v1.0.6 https://github.com/tidyverse/dplyr/commit/22def186ab018417574d458dd987b2cf0bf66332
# As reported in https://github.com/tidyverse/dplyr/issues/6190#issuecomment-1100616836.
)
2) Tested the functions of data.table.threads
and finally merged #9. (3 hours)
3) Commented on a few issues and PRs. (3 hours)
4) Zoom meetings. (2 hours)
- July 17, ‘24:
1) Reviewed #6296 and made a performance test with my GHA on a data.table
fork to display the improvement: (8 hours)
atime::atime_test(
N = 10^seq(1, 9),
setup = {
dt = data.table(a = 1:N)
dt_mod <- copy(dt)
},
expr = data.table:::`[.data.table`(dt_mod, , 1, by = a, verbose = TRUE),
Slow = "cd497408bb4dc6650d871b5076e738420ff431d7" # Using the latest version of data.table I have in my fork (master branch; does not include the improvement thus slow). Just replace this with the preceding commit in data.table's master or the parent of the initial commit in the PR that fixes the issue.
# Fast = "" <-- mc or last commit after that PR is merged.)
2) Reviewed PRs with atime test cases (#4501, #6290, #6288) aside from others (#6297 for e.g.). (4 hours)
3) Discussed how Doris’ talked went and reviewed materials (slides and videos) a bit. (3 hours)
- July 18, ‘24:
1) Creating a rough draft for the fund allocation issue when establishing GitHub Sponsors - evenly distributed among the committers maybe? Can take some motivation from mlr3. (3 hours)
Starting paragraph:
Coming as a point from the interviews (#5880): Establishing a GitHub Sponsors for the [Rdatatable organization](https://github.com/Rdatatable) (just like how [mlr-org](https://github.com/sponsors/mlr-org) is setup for e.g.) is one thing on the Todo with regards to long-term sustainability, especially given that a fair proportion is willing to contribute to such a central fund. As for how and to whom will these funds be dispersed, or what would be a good way to distribute them proportionately, is a question - Any ideas in general for this or regarding the distribution of funds once we have it set up?
- July 19 and 20, ‘24:
1) Redrafting my blog post for my GHA taking into account the new segments on the Performance Testing wiki that Toby wrote. (8 hours)