My life with the r-universe
Using the r-universe to rapidly deploy releases, dependencies, and bugfixes
By Zhian N. Kamvar in Infrastructure R R package
July 26, 2023
Summary
Installing a package that has just been released to CRAN is painful for many users on Mac and Windows because often the difference between a ‘binary’ and a ‘source’ version is not immediately clear and they end up trying to install the source version, which leads to errors and heartbreak. When I was designing The Carpentries Workbench, I needed to make sure that people could reliably install R packages at any time regardless of whether or not they had a compiler set up.
I use a hybrid model of the r-universe and CRAN to host in-development packages that are not on CRAN alongside their dependencies that are released to CRAN, but also require compilation via their latest release tag on GitHub. This provides end users with a repository that will always contain the most up-to-date binary packages that can be easily restored via {renv} without the need of a compiler.
In this blog post, I will provide a summary of the following:
- What is The r-universe and how to set one up
- How to host your own in-development packages
- How to use it as an extension of CRAN to provide just-in-time binaries for your dependencies
- How to use it to provide the latest bugfix versions of critical dependencies before they hit CRAN
What is the r-universe?
The r-universe is a project by rOpenSci that serves as a rolling development repository to host R packages that are in development on a git repository such as GitHub. This has a few benefits from the get-go. With the r-universe, you can:
- host packages that could never be on CRAN due to size restrictions
- provide binary versions of packages that require compilation
- deploy quick bugfixes to your package without needing to ask the users to install from GitHub
- host packages that are not on CRAN that you depend on
- query linux system dependencies via its API
To set up a universe, you do three things:
- create a repository in your GitHub account called “[user].r-universe.dev” (e.g. https://github.com/zkamvar/zkamvar.r-universe.dev)
- add the r-universe app to your GitHub account
- add packages to a JSON file called “packages.json” (e.g. https://github.com/zkamvar/zkamvar.r-universe.dev/blob/main/packages.json)
After that, your universe will be available for use at https://[yourname].r-universe.dev/
and anyone can install those packages.
Rolling your own
Before the r-universe, there were three ways to provide users with your in-development version of a package:
- have people install from GitHub via
remotes::install_github("user/repo")
orpak::pkg_install("user/repo")
- provide a {drat} repository (I’ve posted about auto-building a drat repository previously)
- host a cran-like repository on your personal site that contains the tarball (this is how dev versions of the {ape} package used to be distributed)
All of these solutions either required an extra package and syntax for your users (option 1) or it involved extra work on your end to build and provide updates for the packages on your server (options 2 and 3). Unless you happen to have access to a Linux, Windows, and MacOS machine, you are only able to serve the source version of the packages.
The R universe changes all of that by allowing us to specify that we want to deploy releases in our JSON file:
[
{
"package": "pegboard",
"url": "https://github.com/carpentries/pegboard",
"branch": "*release"
},
{
"package": "sandpaper",
"url": "https://github.com/carpentries/sandpaper",
"branch": "*release"
},
{
"package": "varnish",
"url": "https://github.com/carpentries/varnish",
"branch": "*release"
}
]
Because of this, we are able to deploy releases quickly and continue to test development versions when we need to.
Using R-unvierse as an extension for CRAN
Problem
Back in January 2022, a blog post explaining how {renv} 0.15.0 interacts with the r-universe was put up on the rOpenSci blog. It describes the model by which {renv} will restore package versions in R. In brief, if {renv} sees a binary package version that’s on CRAN or a CRAN archive, it will use that version, and if it does not find that version there, it will install from the r-universe if the SHA hash matches the package DESCRIPTION, and then from the GitHub hash if it does not match.
We had initially put up our packages on the r-universe because it was a good way to distribute them without taxing the user’s GitHub API calls, but there were some issues to hosting only our packages. Imagine that you are a macOS or Windows user and you find that a new-to-you package has just updated on CRAN. What happens when you try to install it?
You get this message:
There are binary versions available but the
source versions are later:
binary source needs_compilation
httpuv 1.6.4 1.6.5 TRUE
Do you want to install from sources the packages which need compilation?
It’s clear that many users simply do not know what to do with this message and run into problems if they try to install the source version. The reason for this message is because CRAN can take up to 3 days to build the MacOS and Windows binary versions of a package that is newly released to CRAN.
So the question is, if your depends on packages that require a complex setup to install the source version, how do you prevent this situation from happening? How can the r-universe provide just-in-time binaries of released versions of packages that you do not control?
Solution
You can add the development version of this package to your packages.json
file
and set the branch to *release
so that it will rebuild on the r-universe
only when the author generates a release.
[
{
"package": "httpuv",
"url": "https://github.com/rstudio/httpuv",
"branch": "*release"
}
]
Because the r-universe will build package binaries within an hour, if someone attempts to install a package that was just released to CRAN, they will not see the dialoge above, because R will detect that a binary is available from your r-universe.
There are some caveats with this however:
- The authors must develop on GitHub.
- The authors must use the GitHub release mechanism when they submit to CRAN.
If either of these are not met, this solution will not work, but you might be able to link to the repository on the
CRAN mirror in GitHub
without the *release
tag.
Seamless bugfixes ahead of cran
In March 2023, {renv} released version 0.17.0, and subsequently caused chaos with lesson workflows using {sandpaper}.
Thanks to the rapid response from Kevin Ushey, I was able to get him to provide bugfixes that addressed these issues to the dev version of {renv}. However, he was also dealing with other bugfixes before he could resubmit to CRAN, so how was I to deploy these fixes to my community before he submitted the fix to CRAN, which could take days?
I added the specific bugfix tag/commit1 from {renv} to my packages.json
file:
[
{
"package": "renv",
"url": "https://github.com/rstudio/renv",
"branch": "0.17.0-38"
}
]
When users installed {renv} from The Carpentries r-universe, they would get version 0.17.0-38 until Kevin pushed the bugfix to CRAN. Note that Kevin was very diligent about bumping the development version number and adding a tag to every fix, but this would still work as long as the developer used any form of development version on GitHub.
The reason this works is because of the same reason that we can serve binaries
before they are built on CRAN: install.packages()
will always look for the
latest version of a package and install that if a binary is available.
Conclusion
Benefits for the R Community
One of the best features of R is a packaging ecosystem that, for the most part, “just works.” This is largely thanks to CRAN and its volunteers who check every package against the packages that it uses and uses it regularly to make sure that they are all compatible with each other. The downside of having such a thorough system for checking packages is that the barrier for entry is very high2. It also means that providing bugfixes can be on the order of days. The r-universe solves these problems by providing a way for authors and organisations to quickly deploy package suites and dependencies without burdening the users with the task of compiling code or installing extra packages.
Grace for my past self
When I was a grad student, I wrote my very first R package,
{poppr}, which I had released to CRAN on
2013-05-26. In 2014, we released the paper describing {poppr} and were preparing
to give
our first
workshop for plant
pathologists at the annual American Phytopathological Meeting. The only catch: I
was working on a new version that would introduce many new features that we
wanted to highlight in our workshop. To get people prepared, I had them to (1)
install R, (2) install a C compiler, (2) install {devtools}, and then (4) use
devtools::install_github("grunwaldlab/poppr@candidate-1.1")
.
My question to you: how do you think that went over? If the answer is: it was a
success, then you are correct! But there is a big caveat to this success: it was
only because I had asked each of the >40 participants to email me the results of
their installation and then I would troubleshoot these installations via email
several weeks before the workshop. If the r-universe had existed back then,
I would have been able to tell people to (1) install R and (2) run
install.packages("poppr", repos = c("https://zkamvar.r-universe.dev", getOptions("repos")))
without having them to wonder why they needed a C
compiler (or what it was) and why they needed {devtools}.
It’s very hard to overstate just how much benefit the r-universe provides to the R community. It allows developers to rapidly deploy testing versions of their packages to their user-base for feedback, and it is one of the main reasons that we are able to easily maintain The Carpentries Workbench. And there are many other benefits that I haven’t even mentioned here such as public APIs to get system dependencies, lightweight HTML manuals, articles, and more! So, thank you to rOpenSci for providing and continuing to support this work.
-
I could have also specified
"branch": "fd9181395e13652d2b3dc942ac1bf807e9564c25"
↩︎ -
This goes beyond the requirements to pass R CMD Check into gatekeeping behaviour from the CRAN maintainers and seemingly arbitrary policies that are regularly shifting, but that’s a discussion for another post. The system is flawed, but it’s still valuable. ↩︎
- Posted on:
- July 26, 2023
- Length:
- 8 minute read, 1687 words
- Categories:
- Infrastructure R R package
- Tags:
- R CRAN Carpentries packages installation rOpenSci