- May 6, ‘24: (7)
1) Made some of the changes Toby mentioned for the speedup plot in #2.
2) Made (edited old ones and created a few new ones) slides for my presentation tomorrow.
- May 7, ‘24: (4)
1) Presented and collected feedback from Toby, Kelly, Tyson, and NAU-ML lab members.
2) Made changes to speedup plot generating code along with Toby - findOptimalThreads
now returns a class (data_table_threads_benchmark
) and there is a separate plot method extending that class (plot.data_table_threads_benchmark
).
3) Discussed with Lawson the details about my travel and lodging for the talk in August.
- May 8, ‘24: (7)
1) Added a geom_line()
to my plot which is half of the ideal speedup’s slope (1). Initially take starts below the point where threads and speedup start (1, 1) since it would be half of the values of the ideal speedup at each point, thus I had to manually modify it to start appropriately (as opposed to 0.5, 0.5) and then progress towards a slope of 0.5 (or ending at 5, 5 for 10 threads for e.g.).
2) Added geom_ribbon()
to show variance/noise (experimented and found 0.3 on both sides to look decent).
3) Organized my code in an appropriate way (three functions) and created examples.
- May 9, ‘24: (10)
1) Documented the functions for my R package.
2) Iteratively ran devtools::check
and corrected the errors (2) and warnings (7) that arose. Done with the creation of my R package (pushed on commit acfb5a0); tested it locally:
> install()
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?
1: All
2: CRAN packages only
3: None
4: data.table (6f008bdd9... -> eaf869eb4...) [GitHub]
Enter one or more numbers, or an empty line to skip updates:
── R CMD build ────────────────────────────────────────────────────────────────────────────────────────────────
✔ checking for file ‘/Users/anirban166/data.table.threads/DESCRIPTION’ ...
─ preparing ‘data.table.threads’:
✔ checking DESCRIPTION meta-information
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘data.table.threads_0.1.1.tar.gz’
Running /Library/Frameworks/R.framework/Resources/bin/R CMD INSTALL \
/var/folders/9_/qc989n050_d2sbtw92scjshr0000gn/T//RtmpePtdKG/data.table.threads_0.1.1.tar.gz \
--install-tests
* installing to library ‘/Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library’
* installing *source* package ‘data.table.threads’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (data.table.threads)
> library(data.table.threads)
> (benchmarkedData <- data.table.threads::findOptimalThreadCount(10000000, 10))
Running benchmarks with 1 thread, 10000000 rows, and 10 columns.
Running benchmarks with 2 threads, 10000000 rows, and 10 columns.
Running benchmarks with 3 threads, 10000000 rows, and 10 columns.
Running benchmarks with 4 threads, 10000000 rows, and 10 columns.
Running benchmarks with 5 threads, 10000000 rows, and 10 columns.
Running benchmarks with 6 threads, 10000000 rows, and 10 columns.
Running benchmarks with 7 threads, 10000000 rows, and 10 columns.
Running benchmarks with 8 threads, 10000000 rows, and 10 columns.
Running benchmarks with 9 threads, 10000000 rows, and 10 columns.
Running benchmarks with 10 threads, 10000000 rows, and 10 columns.
$threadCount
[1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4
[36] 4 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8
[71] 8 8 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10
$expr
[1] "forder" "GForce_sum" "subsetting" "frollmean" "fcoalesce" "between" "fifelse" "nafill"
[9] "CJ" "forder" "GForce_sum" "subsetting" "frollmean" "fcoalesce" "between" "fifelse"
[17] "nafill" "CJ" "forder" "GForce_sum" "subsetting" "frollmean" "fcoalesce" "between"
[25] "fifelse" "nafill" "CJ" "forder" "GForce_sum" "subsetting" "frollmean" "fcoalesce"
[33] "between" "fifelse" "nafill" "CJ" "forder" "GForce_sum" "subsetting" "frollmean"
[41] "fcoalesce" "between" "fifelse" "nafill" "CJ" "forder" "GForce_sum" "subsetting"
[49] "frollmean" "fcoalesce" "between" "fifelse" "nafill" "CJ" "forder" "GForce_sum"
[57] "subsetting" "frollmean" "fcoalesce" "between" "fifelse" "nafill" "CJ" "forder"
[65] "GForce_sum" "subsetting" "frollmean" "fcoalesce" "between" "fifelse" "nafill" "CJ"
[73] "forder" "GForce_sum" "subsetting" "frollmean" "fcoalesce" "between" "fifelse" "nafill"
[81] "CJ" "forder" "GForce_sum" "subsetting" "frollmean" "fcoalesce" "between" "fifelse"
[89] "nafill" "CJ"
$meanTime
[1] 238.604075 15.834843 82.098118 25.305659 11.333574 47.862823 33.384304 8.923383 5.471632
[10] 149.147105 15.756613 67.731085 25.256690 10.199715 39.860884 27.033305 8.434186 4.575443
[19] 120.739087 15.740213 63.735838 25.050872 10.023808 30.423629 23.726084 8.581675 4.583418
[28] 105.888774 15.795719 58.474678 26.241693 8.626748 28.835158 22.174659 9.445956 4.237654
[37] 98.232321 15.756840 58.480091 25.956710 8.847574 26.856262 21.225636 8.266084 4.638750
[46] 93.186126 15.779605 59.211183 25.948519 9.663049 25.499239 20.452548 8.451822 4.282420
[55] 89.772309 15.771334 57.352493 25.581512 9.894759 25.990644 20.695062 10.041181 4.296897
[64] 88.713179 15.740802 58.772491 24.697033 8.976329 27.440822 20.559703 9.042136 5.672482
[73] 90.864280 15.764039 60.712491 25.285269 8.923695 25.864757 22.138314 9.320046 3.663480
[82] 89.682569 15.791384 61.186428 24.114304 9.222357 31.012210 22.981579 8.163546 4.477042
attr(,"row.names")
[1] "forder" "GForce_sum" "subsetting" "frollmean" "fcoalesce" "between" "fifelse"
[8] "nafill" "CJ" "forder1" "GForce_sum1" "subsetting1" "frollmean1" "fcoalesce1"
[15] "between1" "fifelse1" "nafill1" "CJ1" "forder2" "GForce_sum2" "subsetting2"
[22] "frollmean2" "fcoalesce2" "between2" "fifelse2" "nafill2" "CJ2" "forder3"
[29] "GForce_sum3" "subsetting3" "frollmean3" "fcoalesce3" "between3" "fifelse3" "nafill3"
[36] "CJ3" "forder4" "GForce_sum4" "subsetting4" "frollmean4" "fcoalesce4" "between4"
[43] "fifelse4" "nafill4" "CJ4" "forder5" "GForce_sum5" "subsetting5" "frollmean5"
[50] "fcoalesce5" "between5" "fifelse5" "nafill5" "CJ5" "forder6" "GForce_sum6"
[57] "subsetting6" "frollmean6" "fcoalesce6" "between6" "fifelse6" "nafill6" "CJ6"
[64] "forder7" "GForce_sum7" "subsetting7" "frollmean7" "fcoalesce7" "between7" "fifelse7"
[71] "nafill7" "CJ7" "forder8" "GForce_sum8" "subsetting8" "frollmean8" "fcoalesce8"
[78] "between8" "fifelse8" "nafill8" "CJ8" "forder9" "GForce_sum9" "subsetting9"
[85] "frollmean9" "fcoalesce9" "between9" "fifelse9" "nafill9" "CJ9"
attr(,"class")
[1] "data_table_threads_benchmark"
> plot(benchmarkedData)
- May 10, ‘24:
1) Overrode the S3 generic print function and added the print.data_table_threads_benchmark
method (commit acfb5a0).
2) Documented two issues in the process: #6, #4
3) Wrote a basic readme.
4) Making changes to the main page (including a short summary of each week and hours spent working each day going forward) over the weekend.