- Sep 9, ‘24:
1) Resolved the issue of the time-consuming setup based on installation of atime
’s ‘Suggests’ field dependencies (#35), filed #6492 to fix inconsistencies with .ci/atime/tests.R
, made suggested changes. Tested for both these separate things in two PRs on my fork. (7 hours)
2) Zoom meeting. (1 hour)
- Sep 10, ‘24:
1) Tried to make the suggested changes to the closestPoints
computation logic in data.table.threads
. (5 hours)
Stopped at this version/implementation of the plot method:
plot.data_table_threads_benchmark <- function(x, ...)
{
x[, `:=`(speedup = median[threadCount == 1] / median, type = "Measured"), by = expr]
setDTthreads(0)
systemThreadCount <- getDTthreads()
functions <- unique(x$expr)
speedupData <- data.table(
expr = rep(functions, each = systemThreadCount),
threadCount = rep(1:systemThreadCount, length(functions)),
speedup = c(rep(seq(1, systemThreadCount), length(functions)),
rep(seq(1, systemThreadCount / 2, length.out = systemThreadCount), length(functions))),
type = rep(c("Ideal", "Recommended"), each = systemThreadCount * length(functions))
)
maxSpeedup <- x[, .(threadCount = threadCount[which.max(speedup)],
speedup = max(speedup),
type = "Ideal"),
by = expr]
closestPoints <- x[, {
recommendedSubset <- speedupData[type == "Recommended" & expr == .BY$expr]
mergedData <- merge(.SD, recommendedSubset, by = "threadCount", suffixes = c("", "_recommended"))
filteredRows <- mergedData[speedup > speedup_recommended]
if(nrow(filteredRows) > 0)
{
filteredRows[which.max(speedup)]
}
else
{
NA
}
}, by = expr, .SDcols = c("speedup", "threadCount")]
closestPoints <- closestPoints[!is.na(threadCount)]
closestPoints[, type := "Recommended"]
combinedLineData <- rbind(speedupData, x, fill = TRUE)
combinedPointData <- rbind(maxSpeedup, closestPoints, fill = TRUE)
ggplot(x, aes(x = threadCount, y = speedup)) +
geom_line(data = combinedLineData, aes(color = type), size = 1) +
geom_point(data = combinedPointData, aes(color = type), size = 3) +
geom_text(data = combinedPointData, aes(label = threadCount), vjust = -0.5, size = 4, na.rm = TRUE) +
facet_wrap(. ~ expr) +
coord_equal() +
labs(x = "Threads", y = "Speedup", title = "data.table functions") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_x_continuous(breaks = 1:systemThreadCount, labels = 1:systemThreadCount) +
scale_color_manual(values = c("Measured" = "black", "Ideal" = "#f79494", "Recommended" = "#93c4e0")) +
guides(color = guide_legend(title = "Type"))
}
2) Filed #45 for consistency in post images in terms of blending into the background of The Raft website (no matter what color). (2 hours)
- Sep 11, ‘24:
1) Made PR#20@data.table.threads that wraps up the changes that Toby suggested. (3 hours)
2) Writing a blog post about data.table.threads
. (5 hours)
- Sep 12, ‘24:
1) Created an atime
test case for testing a memory efficiency improvement for data.table::melt()
in #20@Anirban166/data.table. (4 hours)
2) More writeup for data.table.threads
. (4 hours)
- Sep 13, ‘24:
1) Tried to revise the transform
regression atime
case (#14). (5 hours)
2) Reviewed and made changes to code in PRs #6290 and #6295 to help Doris in getting those test cases merged in. (3 hours)
3) Continuing my writeup for data.table.threads
. Going to publish in my own blog this weekend or early next week, and then ping Toby for review and send a PR to the Raft after. (1 hour)