This summer has been quite eventful, despite being cornered inside due to the prevailing crisis of the pandemic. I learned a fair deal; wrote a bit of it. Among the writings, this one tends to be a bit unique, since it historically demarks the endpoint of a prestigious program called GSoC 2020 (by a rather, not so well known homo sapien)
Yep, this post will be the terminal one, collectively summarizing all the work accomplished with relevant links.
Links to GSoC’20 work and relevant resources
Please check the links provided underneath the three sections below, which comply with the work done during the GSoC timeline and point out different aspects of my package testComplexity:
(I) The Github Repository
https://github.com/Anirban166/testComplexity
The Pull Requests made (23) and Issues discussed (24) during the GSoC timeline can be accessed from GitHub’s dashboard for the repository, or alternatively within card columns via the Project Board as well.
The readme contains these sections:
- Abstract, describing the problem statement.
- Objectives, highlighting both the initial and subsequently incorporated goals of the project, all of which were accomplished during this summer. (but one could do with improvement, namely the quantification of memory sizes, being subject to future changes)
- Installation, showing the required command(s) to set up my package in R/RStudio directly via
devtools
/remotes
, with the repository enacting as the source. This is the development version, and I would recommend installing it via this method in the future as well, although a CRAN-stable version will be released within September as an upcoming alternative. - Functional Flow, illustrating the functional hierarchy with their respective return formats, function categories, and the commit branches which led to the development of those functions.
- Usage, providing four examples that showcase the quantification and complexity classification functionality, plus the link to the YouTube video demonstrating testComplexity’s use to correctly classify the time complexities of different sorting algorithms over a test session.
- Plotting, illustrating different types of plots for diagnosing trends with provision of the generating code.
- Benchmarking, mentioning the functions/libraries used in our project for benchmarking the time/memory metrics, which is an essential procedure followed to achieve our desired functionality. It happens to be one of the topics I’ve conducted a seminar presentation on, the slide for which can be found here.
- Testing, describing different aspects of testing with relevant sub-sections (functions, unit-testing, code coverage, and OS support).
- Resources, entailing links to my blog posts.
The pkgdown-generated website’s home page poses as an alternative to the readme. The other sections in this site include the reference section for the functions (generated via man-pages from my roxygen documentation), a few articles, and a detailed vignette, which brings me to my next point!
(II) The General Vignette (long-form documentation)
https://anirban166.github.io/testComplexity/articles/testComplexity
This vignette talks about the features and relevant usage notes (including the point on feedback-based optimizations) before elaborating on the functionality of my package from a user-visible standpoint, with the various function categories taken into account. It takes the PeakSegOptimal::PeakSegPDPA()
function for demonstrating the examples for all the aspects of functionality that testComplexity provides.
Note that this particular vignette is meant to give the ordinary R user an introduction to the software, explaining its capabilities/use in a comprehensive manner.
For vignettes which rather describe the use-cases of testComplexity, there is a seperate Articles section in the navigation bar of the website, which currently includes seven different articles accessible via the dropdown. They include consolidated examples for these algorithm-based functions:
substring()
&gregexpr()
opart::gaussian()
PeakSegDP::cDPA()
gfpop::gfpop()
fpop::fpop()
PeakSegOptimal::PeakSegPDPA()
changepoint::cpt.mean()
(inclusive of bothPELT
andSegNeigh
algorithms)
(III) The Blog Posts
https://anirban166.github.io/posts
Writing blog posts is one way I expressed my GSoC journey, and one place where I archived the implementation-specific details of the functions. (the other being the github issues)
A majority of these posts are focused on functions/function-categories as can be guessed from the titles themselves. These include the ones from the first two months (June and July) of the GSoC work timeline, which are written concisely for both users (to refer back while using the functions in the package) and developers (to take note of the implementation if they so wish to reproduce the logic, may it be to improve or extend the existent functionality), with sections such as Function Details: ‘Parameters’, ‘Implementation’, ‘Return value’ and the commented ‘Code’ blocks.
Towards the end, (for August) the posts have a change in the subject (from the initial focus on functions), with one post revolving around the website for testComplexity, and the other revolving around the different software development tools (such as Git and Travis-CI) and libraries (testthat
, reprex
& covr
) used in R, (with my share of experience/knowledge accumulated throughout the summer) which I hope will be useful for the ones who want to get started with software development in R and follow workflow automation for the convenience it provides.
Improvements & Future Steps
At its current state, there is one thing I need to improvise on, which is the memory usage quantifier, wherein it currently uses bench::bench_memory()
to obtain the raw memory allocations, which are fixed and not stochastic in value for each run, (like the timings in contrast) leading to inaccurate space complexity predictions for some cases (mentioned the same point here). As far as I looked up and enquired about the same from others, till now there isn’t any cross-platform alternative for that. (Otherwise I could have used two calls to memory.size()
, but it would end up having the major drawback of GuessCompx
, which is the Windows-only OS limitation for memory complexity testing. testComplexity
overcomes this drawback of GuessCompx
and has other advantages over it as well, which are elaborated here)
Furthermore, I am thinking to make a visually appealing custom ggplot-based theme of my own as well. There was one (hrbrthemes::theme_ft_rc()
) which looked like a ‘yes’ for me, but it had problems such as missing fonts on Windows (as per local runs) and even on Linux/MacOS (as per Travis, #48). This would certainly impose an error on CRAN checks, which is the reason for not appending it to the ggplot object returned by my plotters.
Concerning CRAN, a bundled package has been created via devtools::build()
and is currently undergoing the automatic and manual checks of CRAN. It’s expected to be released shortly within a few weeks, or towards the end of September (will be notice-able on the website under the News section). And yes, this does imply that I am looking forward to keep working on my package!
Special Mentions
A word of thanks to:
- Google/GSoC-team, for organizing and coordinating the program in a timely fashion, in addition to funding my project.
- My mentors Toby Dylan Hocking and Neeraj Dhanraj Bokde for helping me throughout GSoC, with beneficial code/suggestions.
- Marc Agenis for helping me understand the code of
GuessCompx::CompEst
better, via a set of shared responses between us early on. - DocSearch/Aloglia for providing the search functionality in my website.
Every person/entity plays an important role in the project’s upbringing and success!
Overall Experience
This summer has been very engaging and productive, thanks to the inclusion of this project under this year’s edition of GSoC. It’s a real pleasure to be a part of the program and contribute to open-source with extra motivation.
Diving into the subject of complexity testing (which I’m well-acquainted with now) and everything that falls under the hood, I eventually learnt a lot of things R-related, (with much, much more to learn!) from writing efficient code (such as avoiding the x <- rbind(x, new)
idiom, which results in additional complexity inside a loop) to implement my logic, avoiding complexity (such as the use of pipes) in code-readability, working with different algorithms (plus going through their research papers and understanding the necessity, logic and optimization-case scenarios for writing the vignette-based articles) to experimenting with plots and finding different ways to play with data.
Now that I’ve gone through the different stages of R package development (with the software development best-practices that come along), it will be seemingly effortless to follow the same steps for future packages.
Talking about technical documentation aspects, I enjoyed creating the General Vignette towards the end, making me think about my software from the user’s perspective first, which I realize is an important requisite in software development. Writing blogs within stipulated time frames seemed like a good approach to document my stuff as well, which I intend to follow for every project I will be working on henceforth.
Apart from focusing on my project, it was fun making new friends from the GSoC community, learning about their fair share of experience and the intriguing projects that they have been working on. I do hope I would get this golden opportunity for the upcoming year as well, as this sure has been an exciting journey so far!
That concludes my episode of GSoC’20, embedded as a meaningful chapter in these online archives called ‘blogs’ of mine. Thanks for reading!
Anirban | 08/30/2020 |