If you are distributing binary R packages (or any other binary) for Linux, it is important that you check and publish the run-time dependencies for your binaries. This can easily be automated, and prevents many problems and conflicts. Currently RSPM leaves the client guessing which system libraries the binaries are linked to, which results in users installing unnecessary build-time dependencies, sometimes even the wrong ones.
Once you distinguish between build-time and run-time system libraries in Linux distributions, the solution is obvious, and the system will become much simpler and more robust.
This is not a hack, Linux package managers have been designed to automatically determine dependencies between system libraries. You should use the same tools when providing binaries for R packages, even if they are not distributed in a
In a nutshell: After you have successfully built an R package on your Linux server, run
ldd on the package
.so file to list the shared libraries it links to. The operating system package manager (e.g.
dpkg) can tell you which system package each file belongs to. Simply add this information to the binary package DESCRIPTION file that you are shipping. That’s it!
To make it even easier: the
maketools package has an example function that shows the system dependencies for installed R packages on Linux. For example, let’s have a look at the dependencies of the
sf CRAN package. On Ubuntu 20.04 we see:
> maketools::package_sysdeps("sf") shlib package headers source version 1 libproj.so.15.3.1 libproj15 libproj-dev proj 6.3.1-1 2 libgdal.so.26.0.4 libgdal26 libgdal-dev gdal 3.0.4+dfsg-1build3 3 libgeos_c.so.1.13.1 libgeos-c1v5 libgeos-dev geos 3.8.0-1build1 4 libstdc++.so.6.0.28 libstdc++6 <NA> gcc 10-20200411-0ubuntu1
And on Fedora 32 we get:
> maketools::package_sysdeps("sf") shlib package headers source version 1 libproj.so.15.3.2 proj proj-devel proj 6.3.2 2 libgdal.so.26.0.4 gdal-libs gdal-devel gdal 3.0.4 3 libgeos_c.so.1.13.3 geos geos-devel geos 3.8.1 4 libstdc++.so.6.0.28 libstdc++ <NA> gcc 10.2.1
The first column
shlib tells you which shared libraries the R package is linked to, i.e. the filenames of the
.so files. The second column shows which system package this file belongs to. This is the (only) relevant piece of information when you are distributing the binary, because these are exactly the system packages the client needs to have installed for the binary R package to work. Nothing more, nothing less!
A simple way to build R binary packages is on a server or container that has all build-time libraries pre-installed (the per-package build-time dependencies are really not relevant). For example you can use the cranlike
cran/ubuntu docker images for the latest version of Debian and Ubuntu.
docker run -it cran/ubuntu
After building and installing an R package, you check the package run-time dependencies, for example:
> install.packages("openssl") ## ... ## ... ## ** checking absolute paths in shared objects and dynamic libraries ## ** testing if installed package can be loaded from final location ## ** testing if installed package keeps a record of temporary installation path ## * DONE (openssl) > maketools::package_sysdeps("openssl") shlib package headers source version 1 libssl.so.1.1 libssl1.1 libssl-dev openssl 1.1.1f-1ubuntu2 2 libcrypto.so.1.1 libssl1.1 libssl-dev openssl 1.1.1f-1ubuntu2
For every R binary package you distribute, you should provide, at a minimum, the information from the
package column. The best way would be to add this to the DESCRIPTION file of the binary R package, and ideally also expose this in the PACKAGES repository index. Thereby clients can lookup the required system dependencies needed for this binary R package, 100% reliably, without guessing or conflicts.