Workflows for directlabels

30 Jul 2021

Reading time ~9 minutes

Automating stuff within and outside the SDLC scope is hella fun, but I wonder if it has ever been so simple without the use of GitHub Actions, now that I’ve been using them for creating convenient workflows and CI/CD pipelines. Since recently, I’ve been working on a graphics package called directlabels (under this year’s version of Google Summer of Code), which to be more specific, is a direct-labelling framework for ggplot2/lattice plots; and for the related GitHub repositories, I created various workflows which are comprised of multiple GitHub Actions. This blog post discusses them.

Testing the dlcompare bug

name: dlcompare-test
on: [push]
jobs:
  run:
    runs-on: [ubuntu-latest]
    container: docker://dvcorg/cml:latest
    steps:
      - uses: actions/checkout@v2
      - name: Setup R:
        uses: r-lib/actions/setup-r@master
      - name: Install dependencies:
        run: |
          R -e 'install.packages(c("ggplot2", "directlabels"))'     
      - name: Run my script and push results in markdown as a GA-bot comment:
        env: 
          repo_token: ${
                        { secrets.GITHUB_TOKEN }
                       } 
        run: |
          R -e 'source("R/test.R")'   
          # Retrieve comparison plot and publish it within a markdown file (dlcompare output):
          cml-publish result.png --md > report.md 
          # Generate report:
          cml-send-comment report.md   

This workflow uses DVCorg’s Continuous Machine Learning (CML) container which helps to generate a convenient report in the subsequent pull request thread on GitHub. This is part of a three-stage setup (run on the repository level environment), wherein the first part saves a .png file of my needs (for visual-debugging/testing) via my R script:

library(ggplot2)
library(directlabels)
# Create some custom lineplot-label positioning functions:
line.startpoint <- function(d, ...) {
  transform(gapply(d, subset, x == min(x)), hjust = 1)
}
line.endpoint <- function(d, ...) {
  transform(gapply(d, subset, x == max(x)), hjust = 0)
} 
line.extremepoints <- dl.combine(line.startpoint, line.endpoint)
# Take data from the medium test:
alpha.data <- data.frame(size = c(1, 2, 3, 4, 5), power = c(1, 1.25, 1.75, 2.5, 3.5), category = "Alpha")
beta.data <- data.frame(size = c(1, 2, 3, 4, 5), power = c(1.5, 3, 5, 7.25, 9.5), category = "Beta")
gamma.data <- data.frame(size = c(1, 2, 3, 4, 5), power = c(2, 4, 6.5, 10, 13.5), category = "Gamma")
df <- rbind(alpha.data, beta.data, gamma.data)
# Create the ggplot object to plot the data, compare the defined positioning methods via dlcompare on that, and finally save the returned result as a .png file via ggsave to be later accessible to the runner: 
g <- ggplot(df, aes(size, power, color = category)) + geom_line() + xlim(0, 6) + labs(x = "Particle Size", y = "Penetration Power") + coord_cartesian(clip = "off")
ggsave("result.png", dlcompare(list(lineplot.methods = g), list("line.startpoint", "line.endpoint", "line.extremepoints")), device = "png")

The file stays on the runner and can be retrieved directly within subsequent stages of the workflow. Furthermore, combining this with cml-publish allows us to stack the image into acceptable formats, such as within a markdown file. Lastly, using cml-send-comment does what it literally means, which when used in conjunction with our freshly generated markdown file (as an argument), displays its contents as a comment in the PR-thread itself. Pretty neat eh? :)

Thanks to Elle O’Brien for creating a video which got me introduced to this, although there were a few things I changed and corrected (using renv led to a curl installation failure) on my own, whilst expecting it to be plug and play. Regarding changes - Installing curl separately, or going with the approach to use a local library-state snapshot (stored within renv.lock) was what I avoided, and instead, I simply installed the dependencies and ran the R script standalone (without renv::run()).

Goal here was to check if the bug persisted on an Ubuntu VM running somewhere across the globe, in order to confirm/discard the assumption that it was specific to me or a particular OS (the bug was reproduced in OS X and Windows previously). Turns out it was not an isolated case.

Replicating commits made in master for gh-pages

name: Commit changes to gh-pages
on:
  push:
    branches:
      - master
    paths:
      - R/**.R
  pull_request:
    branches:
      - master
    paths:
      - R/**.R
jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v2.3.1
      - name: Deploy 
        uses: JamesIves/github-pages-deploy-action@4.1.0
        with:
          branch: gh-pages 
          folder: R
          clean: false
          target-folder: R

This workflow is triggered on commits that involve changes to the R files contained within the R/ directory for the master branch, may it be from a direct push, or one from a pull request authored by anyone. In terms of a successful run, the commits must be made by a contributor with write access, i.e. either from the owner of the repository or a collaborator for the same, as it will use the git credentials of the one making the PR/push. For instance, a regular contributor’s commits from his/her pull request would only get pushed to gh-pages when someone with write access merges the commit, wherein the merge-commit is the one validating the process.

These modifications are then brought under the R/ folder on the gh-pages branch (thus keeping the .R files present there up-to-date with the ones on the master branch) with the help of this action.

Apart from such specific changes, a commit might involve changes in other directories/files as well, in which case too, only the modifications (including the addition of new .R files) made to R files inside the R/ folder will be carry-forwarded to gh-pages, since the deployment is made based on my designated path filter (paths: - R/**.R).

Caveats:
- An important note for future edits: clean must be always set to false (default is true) for our case here, as when set as true, it would eradicate everything on the gh-pages branch which are not covered as changes!
- While adding R files or changing the content in them does count as modifications, deleting them does not, i.e. to say file deletions won’t trigger the workflow (not specific to just this though, but for all ‘on: push’ type events)
- folder and target-folder may look the same here, but they are different in the sense that the latter requires a full-path specification. For instance, if I were to make this same workflow for carrying over changes from master to dldoc, I would need to specify folder: R and target folder: pkg/directlabels/R (apart from setting branch to dldoc).

Automating dldoc (documentation build script) runs

name: Update documentation using dldoc
on:
  workflow_run:
    workflows: ["Commit changes to gh-pages"]
    types:
      - completed
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Setup R 
      uses: r-lib/actions/setup-r@master      
    - name: Install dependencies
      run: |
        R -e 'install.packages(c("ggplot2", "inlinedocs", "reshape2", "mlmRev", "lars", "latticeExtra"))'
    - name: Setup credentials
      run: |
        git config user.name "GitHub Actions Bot"
        git config user.email "<>"
    - name: Clone and install directlabels
      run: |
        cd ~
        git clone https://github.com/Anirban166/directlabels
        R -e 'install.packages("~/directlabels", type = "source", repo = NULL)'        
        cd directlabels
        git checkout gh-pages
    - name: Compute the docs
      run: |
        R -e 'inlinedocs::package.skeleton.dx("~/directlabels")'
        R -e 'library(directlabels); dldoc("~/directlabels")'
    - name: Push back changes
      run: |
        git add .
        git commit -m "Updated documentation”
        git push origin gh-pages

This is a conditional workflow (uses a workflow_run event), i.e. it will only run/get-triggered based on the former workflow’s completion. One could also use the ‘page_build’ condition which runs a workflow after someone (in our case, it’ll be the GitHub Actions bot) pushes to a GitHub-Pages enabled branch, which triggers the page_build event.

Following checkout, I’m using the actions provided by r-lib to set up R on our runner, and then I’m installing the required dependencies (ones required to run dldoc()) apart from directlabels (which will need to be installed using the current updated/developmental version on GitHub and not from a CRAN mirror). Since I’ll be running a bunch of git commands, I set up the git credentials for the GitHub Actions bot next. Then I cloned the directlabels repository to the runner’s home directory and installed directlabels via source using that. Next, I navigate inside the repository folder and switch from master (default) to the gh-pages branch, since the files required for computing and generating the directlabels website are present there.

Before running dldoc, I re-create the package skeleton using inlinedocs. Once that is done, I finally run it with directlabels imported in the namespace and the path assigned to the home directory. With the updated documentation files locally available in ~/directlabels on the runner, I simply add, commit and push back the changes to gh-pages.

Automating code coverage reporting

name: Test coverage
on:
  push:
    branches:
      - master
  pull_request:
    branches:
      - master
jobs:
  test-coverage:
    runs-on: macOS-latest
    env:
      GITHUB_PAT: ${
                    { secrets.GITHUB_TOKEN }
                   }
    steps:
      - uses: actions/checkout@v2
      - uses: r-lib/actions/setup-r@v1
      - uses: r-lib/actions/setup-pandoc@v1
      - name: Query dependencies
        run: |
          install.packages('remotes')
          saveRDS(remotes::dev_package_deps(dependencies = TRUE), ".github/depends.Rds", version = 2)
          writeLines(sprintf("R-%i.%i", getRversion()$major, getRversion()$minor), ".github/R-version")
        shell: Rscript {0}
      - name: Restore R package cache
        uses: actions/cache@v2
        with:
          path: ${
                  { env.R_LIBS_USER }
                 }
          key: ${
                 { runner.os }
                 }-${
                     { hashFiles('.github/R-version') }
                    }-1-${
                          { hashFiles('.github/depends.Rds') }
                         }
          restore-keys: ${
                          { runner.os }
                         }-${
                             { hashFiles('.github/R-version') }
                            }-1-
      - name: Install dependencies
        run: |
          install.packages(c("remotes"))
          remotes::install_deps(dependencies = TRUE)
          remotes::install_cran("covr")
        shell: Rscript {0}
      - name: Test coverage
        run: covr::codecov(token = "${
                                      { secrets.CODECOV_SECRET }
                                     }")
        shell: Rscript {0}

With a slightly modified boilerplate provided by r-lib/actions, this automates the process of uploading a code coverage report to one’s codecov account for every commit on the target repository, given the initial codecov setup via usethis::use_coverage(type = "codecov") is done with. The auto-generated .yml file can be edited to this basic template (modify as per requirement):

comment: false
language: R
sudo: false
cache: packages
after_success:
Rscript -e 'covr::codecov()'
coverage:
  status:
    project:
      default:
        target: auto
        threshold: 1%
        informational: true
    patch:
      default:
        target: auto
        threshold: 1%
        informational: true

The thus generated unique codecov token would be used for subsequent runs, following reports for that particular repository to one’s codecov account, which in turn is associated with one’s GitHub account.

While simply using covr::codecov("token") would do the job, the token value is usually not meant to be public, and hence this is better off as a GitHub secret (either at the repository level or scoped to an environment). Making one is relatively simple, and this can be achieved by navigating to ‘Settings’ > ‘Secrets’ > ‘New repository secret’ in your GitHub account, and for the ‘Value’ field therein, simply emplace the token value, apart from assigning it a name. The name should be used in your CI/CD workflow with the usual notation to access a secret variable, as in ${{ secrets.SECRET_VAR }} (due to Jekyll’s bug with nested curly braces, I’m separating out the two pairs) like the one I used in my workflow above.

In the future, GitHub’s webhook will keep the repository in sync and update the coverage metrics in one’s codecov account as per the latest report they generate (for the most recent commit), while the created workflow will run covr::codecov() for every push/pull-request, automating the process for each commit.

Anirban

07/30/2021